Machine-learned interatomic potentials (MLIPs) are deployed for high-throughput materials screening without formal reliability guarantees. We show that a single MLIP used as a stability filter misses 93% of density functional theory (DFT)-stable materials (recall 0.07) on a 25,000-material benchmark. Proof-Carrying Materials (PCM) closes this gap through three stages: adversarial falsification across compositional space, bootstrap envelope refinement with 95% confidence intervals, and Lean 4 formal certification. Auditing CHGNet, TensorNet and MACE reveals architecture-specific blind spots with near-zero pairwise error correlations (r <= 0.13; n = 5,000), confirmed by independent Quantum ESPRESSO validation (20/20 converged; median DFT/CHGNet force ratio 12x). A risk model trained on PCM-discovered features predicts failures on unseen materials (AUC-ROC = 0.938 +/- 0.004) and transfers across architectures (cross-MLIP AUC-ROC ~ 0.70; feature importance r = 0.877). In a thermoelectric screening case study, PCM-audited protocols discover 62 additional stable materials missed by single-MLIP screening - a 25% improvement in discovery yield.
翻译:机器学习原子间势能(MLIPs)目前在高通量材料筛选中被广泛应用,但缺乏形式化的可靠性保证。我们通过一个包含25,000种材料的基准测试表明,使用单一MLIP作为稳定性过滤器会漏掉93%的密度泛函理论(DFT)稳定材料(召回率0.07)。可证伪材料(PCM)方法通过三个阶段填补这一空白:在成分空间中进行对抗性证伪、采用95%置信区间的自举包络优化,以及基于Lean 4的形式化验证。对CHGNet、TensorNet和MACE的审计揭示了架构特定的盲点,其成对误差相关性接近零(r ≤ 0.13;n = 5,000),该结果经独立Quantum ESPRESSO验证确认(20/20收敛;DFT/CHGNet力比中位数为12倍)。基于PCM发现特征训练的风险模型能够预测未知材料的失效情况(AUC-ROC = 0.938 ± 0.004),并实现跨架构迁移(跨MLIP AUC-ROC ~ 0.70;特征重要性r = 0.877)。在热电材料筛选案例研究中,采用PCM审计的方案额外发现了62种被单一MLIP筛选遗漏的稳定材料——发现效率提升25%。