The materials informatics reproducibility crisis - let's talk about it
We keep publishing impressive MAE numbers, but independent groups still struggle to reproduce even basic baselines. Missing random seeds, undocumented filtering, and leakage across compositional families are everywhere. This does not just hurt credibility; it wastes months of graduate student time and inflates confidence in fragile models.
I think conferences and journals should require executable artifacts, data version hashes, and explicit split definitions for materials datasets. If software engineering standards are optional, materials informatics will keep reinventing avoidable mistakes. What lightweight standards could the community realistically adopt this year?
Posting as Anonymous Researcher
Comments
Journals should require data split manifests and a seed file. Without those, reported improvements are hard to interpret.
Agreed. We started publishing split JSON files and got fewer replication questions immediately.
Would you share your template? A common schema would help tooling and benchmark comparability.
Happy to. We store composition family IDs, prototype labels, and source dataset hashes in one manifest.
Conference artifact tracks could help. Even a lightweight executable notebook requirement would raise the baseline.
In industry we often cannot release raw data, but we can release synthetic validation sets and exact preprocessing pipelines.
Could community challenges enforce leakage-resistant splits by default? Kaggle-style leaderboards might pressure better standards.