The prediction of chemical reactions has gained significant interest within the machine learning community in recent years, owing to its complexity and crucial applications in chemistry. However, model evaluation for this task has been mostly limited to simple metrics like top-k accuracy, which obfuscates fine details of a model's limitations. Inspired by progress in other fields, we propose a new assessment scheme that builds on top of current approaches, steering towards a more holistic evaluation. We introduce the following key components for this goal: CHORISO, a curated dataset along with multiple tailored splits to recreate chemically relevant scenarios, and a collection of metrics that provide a holistic view of a model's advantages and limitations. Application of this method to state-of-the-art models reveals important differences on sensitive fronts, especially stereoselectivity and chemical out-of-distribution generalization. Our work paves the way towards robust prediction models that can ultimately accelerate chemical discovery.
翻译:近年来,化学反应预测因其复杂性和在化学中的关键应用而引起了机器学习领域的广泛关注。然而,该任务的模型评估主要局限于top-k准确率等简单指标,这掩盖了模型局限性的细节。受其他领域进展的启发,我们提出了一种基于现有方法的新评估方案,旨在实现更全面的评估。为实现这一目标,我们引入了以下关键组件:CHORISO——一个精心策划的数据集及其多个定制划分,用于重建化学相关场景;以及一系列指标,用于全面展示模型的优势与局限性。将该方法应用于最先进的模型,揭示了其在敏感方面的重大差异,尤其是立体选择性和化学分布外泛化能力。我们的工作为开发鲁棒的预测模型铺平了道路,这些模型最终将加速化学发现。