Billions of organic molecules are known, but only a tiny fraction of the functional inorganic materials have been discovered, a particularly relevant problem to the community searching for new quantum materials. Recent advancements in machine-learning-based generative models, particularly diffusion models, show great promise for generating new, stable materials. However, integrating geometric patterns into materials generation remains a challenge. Here, we introduce Structural Constraint Integration in the GENerative model (SCIGEN). Our approach can modify any trained generative diffusion model by strategic masking of the denoised structure with a diffused constrained structure prior to each diffusion step to steer the generation toward constrained outputs. Furthermore, we mathematically prove that SCIGEN effectively performs conditional sampling from the original distribution, which is crucial for generating stable constrained materials. We generate eight million compounds using Archimedean lattices as prototype constraints, with over 10% surviving a multi-staged stability pre-screening. High-throughput density functional theory (DFT) on 26,000 survived compounds shows that over 50% passed structural optimization at the DFT level. Since the properties of quantum materials are closely related to geometric patterns, our results indicate that SCIGEN provides a general framework for generating quantum materials candidates.
翻译:已知有机分子数以亿计,但已发现的功能性无机材料仅占极小比例,这对于寻找新型量子材料的研究群体而言尤为关键。近年来,基于机器学习的生成模型(特别是扩散模型)在生成新型稳定材料方面展现出巨大潜力。然而,将几何构型整合到材料生成中仍面临挑战。本文提出结构约束集成生成模型(SCIGEN)。该方法通过在每个扩散步骤前,对去噪结构进行策略性掩码处理并融入扩散化的约束结构先验,从而引导生成过程朝向约束输出,可对任何已训练的生成扩散模型进行修改。此外,我们从数学上证明了SCIGEN能够有效实现原始分布的条件采样,这对生成稳定的约束材料至关重要。我们以阿基米德晶格作为原型约束条件生成了八百万种化合物,其中超过10%通过了多阶段稳定性预筛选。对26,000种留存化合物进行高通量密度泛函理论(DFT)计算表明,超过50%的化合物在DFT层面通过了结构优化。鉴于量子材料的性质与几何构型密切相关,我们的研究结果表明SCIGEN为生成量子材料候选物提供了一个通用框架。