The reproducibility crisis is particularly bad in NLP.

This is a simple idea: automatically determine which papers would make sense to put forth the efforts to reproduce their results.

This can be done using some training data or a metric put forth by the ACL community (e.g., citations, potential for reproducibility errors, etc).

Then high repro scores papers that had not been reproduced enough time can have their own track at ACL conference and reviewed accordingly.