Generative machine learning (ML) models hold great promise for accelerating materials discovery through the inverse design of inorganic crystals, enabling an unprecedented exploration of chemical space. Yet, the lack of standardized evaluation frameworks makes it challenging to evaluate, compare, and further develop these ML models meaningfully. In this work, we introduce LeMat-GenBench, a unified benchmark for generative models of crystalline materials, supported by a set of evaluation metrics designed to better inform model development and downstream applications. We release both an open-source evaluation suite and a public leaderboard on Hugging Face, and benchmark 12 recent generative models. Results reveal that an increase in stability leads to a decrease in novelty and diversity on average, with no model excelling across all dimensions. Altogether, LeMat-GenBench establishes a reproducible and extensible foundation for fair model comparison and aims to guide the development of more reliable, discovery-oriented generative models for crystalline materials.
翻译:生成式机器学习模型在通过无机晶体的逆向设计加速材料发现方面具有巨大潜力,能够实现对化学空间前所未有的探索。然而,由于缺乏标准化的评估框架,难以对这些机器学习模型进行有意义的评估、比较和进一步开发。在本工作中,我们提出了LeMat-GenBench,这是一个用于晶体材料生成模型的统一基准,并辅以一套旨在更好地指导模型开发和下游应用的评估指标。我们发布了开源评估套件和Hugging Face上的公共排行榜,并对12个近期提出的生成模型进行了基准测试。结果表明,稳定性的提升平均会导致新颖性和多样性的下降,且没有模型在所有维度上都表现出色。总而言之,LeMat-GenBench为公平的模型比较建立了一个可复现且可扩展的基础,旨在指导开发更可靠、面向发现的晶体材料生成模型。