To address pressing scientific challenges such as climate change, increasingly sophisticated generative artificial intelligence models are being developed that can efficiently sample the large chemical space of possible functional materials. These models can quickly sample new chemical compositions paired with crystal structures. They are typically evaluated using uniqueness and novelty metrics, which depend on a chosen crystal distance function. However, the most prevalent distance function has four limitations: it fails to quantify the degree of similarity between compounds, cannot distinguish compositional difference and structural difference, lacks Lipschitz continuity against shifts in atomic coordinates, and results in a uniqueness metric that is not invariant against the permutation of generated samples. In this work, we propose using two continuous distance functions to evaluate uniqueness and novelty, which theoretically overcome these limitations. Our experiments show that these distances reveal insights missed by traditional distance functions, providing a more reliable basis for evaluating and comparing generative models for inorganic crystals.
翻译:为应对气候变化等紧迫的科学挑战,人们正在开发日益复杂的生成式人工智能模型,以高效采样可能的功能材料所对应的广阔化学空间。这些模型能够快速采样与晶体结构配对的新化学成分。通常使用唯一性和新颖性度量来评估这些模型,这些度量依赖于选定的晶体距离函数。然而,最常用的距离函数存在四个局限性:无法量化化合物之间的相似程度,不能区分成分差异与结构差异,缺乏关于原子坐标平移的Lipschitz连续性,以及导致唯一性度量对生成样本的排列不具备不变性。在本工作中,我们提出使用两个连续距离函数来评估唯一性和新颖性,理论上克服了这些局限性。实验表明,这些距离函数能够揭示传统距离函数所忽略的洞见,为评估和比较无机晶体的生成模型提供了更可靠的基础。