Designing molecules that must satisfy multiple, often conflicting objectives is a central challenge in molecular discovery. The enormous size of chemical space and the cost of high-fidelity simulations have driven the development of machine learning-guided strategies for accelerating design with limited data. Among these, Bayesian optimization (BO) offers a principled framework for sample-efficient search, while generative models provide a mechanism to propose novel, diverse candidates beyond fixed libraries. However, existing methods that couple the two often rely on continuous latent spaces, which introduces both architectural entanglement and scalability challenges. This work introduces an alternative, modular "generate-then-optimize" framework for de novo multi-objective molecular design/discovery. At each iteration, a generative model is used to construct a large, diverse pool of candidate molecules, after which a novel acquisition function, qPMHI (multi-point Probability of Maximum Hypervolume Improvement), is used to optimally select a batch of candidates most likely to induce the largest Pareto front expansion. The key insight is that qPMHI decomposes additively, enabling exact, scalable batch selection via only simple ranking of probabilities that can be easily estimated with Monte Carlo sampling. We benchmark the framework against state-of-the-art latent-space and discrete molecular optimization methods, demonstrating significant improvements across synthetic benchmarks and application-driven tasks. Specifically, in a case study related to sustainable energy storage, we show that our approach quickly uncovers novel, diverse, and high-performing organic (quinone-based) cathode materials for aqueous redox flow battery applications.
翻译:设计必须满足多个且常相互冲突目标的分子是分子发现领域的核心挑战。化学空间的巨大规模与高保真模拟的高昂成本推动了机器学习引导策略的发展,以在有限数据下加速设计进程。其中,贝叶斯优化(BO)为样本高效搜索提供了理论框架,而生成模型则为提出超越固定库的新颖、多样化候选分子提供了机制。然而,现有将两者结合的方法通常依赖于连续潜在空间,这既引入了架构上的纠缠,也带来了可扩展性挑战。本研究提出了一种替代性的模块化“先生成后优化”框架,用于多目标分子的从头设计/发现。在每次迭代中,生成模型用于构建一个庞大且多样化的候选分子池,随后采用一种新颖的采集函数——qPMHI(多点最大超体积改进概率),来最优地选择一批最有可能引发帕累托前沿最大扩展的候选分子。其关键洞见在于qPMHI具有可加性分解特性,仅需通过蒙特卡洛采样易于估计的概率进行简单排序,即可实现精确且可扩展的批量选择。我们将该框架与最先进的潜在空间及离散分子优化方法进行基准测试,在合成基准和应用驱动任务中均展现出显著优势。具体而言,在与可持续能源存储相关的案例研究中,我们证明该方法能快速发现用于水系氧化还原液流电池应用的新型、多样化且高性能的有机(醌基)正极材料。