Generative models for 3D drug design have gained prominence recently for their potential to design ligands directly within protein pockets. Current approaches, however, often suffer from very slow sampling times or generate molecules with poor chemical validity. Addressing these limitations, we propose Semla, a scalable E(3)-equivariant message passing architecture. We further introduce a molecular generation model, SemlaFlow, which is trained using flow matching along with scale optimal transport, a novel extension of equivariant optimal transport. Our model produces state-of-the-art results on benchmark datasets with just 100 sampling steps. Crucially, SemlaFlow samples high quality molecules with as few as 20 steps, corresponding to a two order-of-magnitude speed-up compared to state-of-the-art, without sacrificing performance. Furthermore, we highlight limitations of current evaluation methods for 3D generation and propose new benchmark metrics for unconditional molecular generators. Finally, using these new metrics, we compare our model's ability to generate high quality samples against current approaches and further demonstrate SemlaFlow's strong performance.
翻译:近年来,三维药物设计的生成模型因其能够在蛋白质口袋内直接设计配体的潜力而受到广泛关注。然而,现有方法通常存在采样速度极慢或生成分子化学有效性差的问题。为解决这些局限性,我们提出了Semla,一种可扩展的E(3)-等变消息传递架构。我们进一步引入了一种分子生成模型SemlaFlow,该模型使用流匹配以及尺度最优传输进行训练,后者是等变最优传输的一种新颖扩展。我们的模型仅需100个采样步骤即可在基准数据集上取得最先进的结果。至关重要的是,SemlaFlow仅需20步即可采样出高质量分子,与现有最优方法相比实现了两个数量级的加速,且性能无损。此外,我们指出了当前三维生成评估方法的局限性,并为无条件分子生成器提出了新的基准评估指标。最后,利用这些新指标,我们比较了我们的模型与现有方法生成高质量样本的能力,并进一步证明了SemlaFlow的卓越性能。