The efficient exploration of chemical space to design molecules with intended properties enables the accelerated discovery of drugs, materials, and catalysts, and is one of the most important outstanding challenges in chemistry. Encouraged by the recent surge in computer power and artificial intelligence development, many algorithms have been developed to tackle this problem. However, despite the emergence of many new approaches in recent years, comparatively little progress has been made in developing realistic benchmarks that reflect the complexity of molecular design for real-world applications. In this work, we develop a set of practical benchmark tasks relying on physical simulation of molecular systems mimicking real-life molecular design problems for materials, drugs, and chemical reactions. Additionally, we demonstrate the utility and ease of use of our new benchmark set by demonstrating how to compare the performance of several well-established families of algorithms. Surprisingly, we find that model performance can strongly depend on the benchmark domain. We believe that our benchmark suite will help move the field towards more realistic molecular design benchmarks, and move the development of inverse molecular design algorithms closer to designing molecules that solve existing problems in both academia and industry alike.
翻译:化学空间的高效探索,旨在设计具有预期性质的分子,能够加速药物、材料和催化剂的发现,是化学领域最重要的未解难题之一。受近期计算能力提升和人工智能发展的激励,已有许多算法被开发用于解决这一问题。然而,尽管近年来涌现出许多新方法,但在开发反映真实世界应用分子设计复杂性的逼真基准方面进展相对有限。在本研究中,我们构建了一套实用的基准测试任务,这些任务基于分子系统的物理模拟,模拟了材料、药物和化学反应领域的真实分子设计问题。此外,我们通过展示如何比较多个成熟算法系列的性能,证明了新基准集的实用性和易用性。令人惊讶的是,我们发现模型性能可能严重依赖于基准测试领域。我们相信,这一基准测试套件将推动该领域向更真实的分子设计基准迈进,并使逆分子设计算法的开发更接近于设计出能够解决学术界和工业界现有问题的分子。