The efficient exploration of chemical space to design molecules with intended properties enables the accelerated discovery of drugs, materials, and catalysts, and is one of the most important outstanding challenges in chemistry. Encouraged by the recent surge in computer power and artificial intelligence development, many algorithms have been developed to tackle this problem. However, despite the emergence of many new approaches in recent years, comparatively little progress has been made in developing realistic benchmarks that reflect the complexity of molecular design for real-world applications. In this work, we develop a set of practical benchmark tasks relying on physical simulation of molecular systems mimicking real-life molecular design problems for materials, drugs, and chemical reactions. Additionally, we demonstrate the utility and ease of use of our new benchmark set by demonstrating how to compare the performance of several well-established families of algorithms. Surprisingly, we find that model performance can strongly depend on the benchmark domain. We believe that our benchmark suite will help move the field towards more realistic molecular design benchmarks, and move the development of inverse molecular design algorithms closer to designing molecules that solve existing problems in both academia and industry alike.
翻译:高效探索化学空间以设计具备预期性质的分子,能够加速药物、材料和催化剂的发现,是化学领域最重要的未解挑战之一。受近期算力提升与人工智能发展的推动,研究者已开发出多种算法应对此问题。然而,尽管近年来涌现出众多新方法,但在构建能反映真实应用场景中分子设计复杂性的现实基准方面进展相对有限。本研究基于分子系统的物理模拟,开发了一套面向材料、药物及化学反应等真实分子设计问题的实用基准任务。此外,我们通过演示如何比较多个经典算法家族的性能,展示了新基准集的实用性与易用性。令人惊讶的是,我们发现模型性能高度依赖于基准域。我们相信,本基准套件将推动该领域迈向更真实的分子设计基准,并使逆分子设计算法的开发更接近解决学术界与工业界现有问题的目标。