Generative models for 3D drug design have gained prominence recently for their potential to design ligands directly within protein pockets. Current approaches, however, often suffer from very slow sampling times or generate molecules with poor chemical validity. Addressing these limitations, we propose Semla, a scalable E(3)-equivariant message passing architecture. We further introduce a molecular generation model, MolFlow, which is trained using flow matching along with scale optimal transport, a novel extension of equivariant optimal transport. Our model produces state-of-the-art results on benchmark datasets with just 100 sampling steps. Crucially, MolFlow samples high quality molecules with as few as 20 steps, corresponding to a two order-of-magnitude speed-up compared to state-of-the-art, without sacrificing performance. Furthermore, we highlight limitations of current evaluation methods for 3D generation and propose new benchmark metrics for unconditional molecular generators. Finally, using these new metrics, we compare our model's ability to generate high quality samples against current approaches and further demonstrate MolFlow's strong performance.
翻译:近年来,三维药物设计的生成模型因其能够在蛋白质口袋内直接设计配体的潜力而备受关注。然而,现有方法通常存在采样速度极慢或生成分子化学有效性差的问题。为解决这些局限,我们提出了Semla,一种可扩展的E(3)-等变消息传递架构。我们进一步引入了一种分子生成模型MolFlow,该模型采用流匹配及尺度最优传输(等变最优传输的一种新颖扩展)进行训练。我们的模型仅需100步采样即可在基准数据集上取得最先进的结果。关键在于,MolFlow仅需20步即可采样出高质量分子,相比现有最优方法实现了两个数量级的加速,且性能无损。此外,我们指出了当前三维生成评估方法的局限性,并为无条件分子生成器提出了新的基准评估指标。最后,利用这些新指标,我们将本模型生成高质量样本的能力与现有方法进行了比较,进一步证明了MolFlow的卓越性能。