Molecular structure generation is a fundamental problem that involves determining the 3D positions of molecules' constituents. It has crucial biological applications, such as molecular docking, protein folding, and molecular design. Recent advances in generative modeling, such as diffusion models and flow matching, have made great progress on these tasks by modeling molecular conformations as a distribution. In this work, we focus on flow matching and adopt an energy-based perspective to improve training and inference of structure generation models. Our view results in a mapping function, represented by a deep network, that is directly learned to \textit{iteratively} map random configurations, i.e. samples from the source distribution, to target structures, i.e. points in the data manifold. This yields a conceptually simple and empirically effective flow matching setup that is theoretically justified and has interesting connections to fundamental properties such as idempotency and stability, as well as the empirically useful techniques such as structure refinement in AlphaFold. Experiments on protein docking as well as protein backbone generation consistently demonstrate the method's effectiveness, where it outperforms recent baselines of task-associated flow matching and diffusion models, using a similar computational budget.
翻译:分子结构生成是一个基础性问题,涉及确定分子组成成分的三维位置。该问题具有重要的生物学应用,例如分子对接、蛋白质折叠和分子设计。生成建模(如扩散模型和流匹配)的最新进展通过将分子构象建模为分布,在这些任务上取得了重大进展。在本工作中,我们聚焦于流匹配,并采用基于能量的视角来改进结构生成模型的训练与推理。我们的视角产生了一个由深度网络表示的映射函数,该函数被直接学习以*迭代地*将随机构型(即来自源分布的样本)映射到目标结构(即数据流形中的点)。这形成了一个概念简单且经验有效的流匹配框架,该框架具有理论依据,并与幂等性、稳定性等基本性质以及AlphaFold中经验有效的结构精修等技术存在有趣联系。在蛋白质对接及蛋白质骨架生成任务上的实验一致证明了该方法的有效性:在相似计算预算下,其性能优于近期基于任务关联流匹配和扩散模型的基线方法。