Accurate prediction of protein-ligand binding poses is crucial for structure-based drug design, yet existing methods struggle to balance speed, accuracy, and physical plausibility. We introduce Matcha, a novel molecular docking pipeline that combines multi-stage flow matching with learned scoring and physical validity filtering. Our approach consists of three sequential stages applied consecutively to refine docking predictions, each implemented as a flow matching model operating on appropriate geometric spaces ($\mathbb{R}^3$, $\mathrm{SO}(3)$, and $\mathrm{SO}(2)$). We enhance the prediction quality through a dedicated scoring model and apply unsupervised physical validity filters to eliminate unrealistic poses. Compared to various approaches, Matcha demonstrates superior performance on Astex and PDBbind test sets in terms of docking success rate and physical plausibility. Moreover, our method works approximately 25 times faster than modern large-scale co-folding models. The model weights and inference code to reproduce our results are available at https://github.com/LigandPro/Matcha.
翻译:蛋白质-配体结合构象的精确预测对于基于结构的药物设计至关重要,然而现有方法难以在速度、精度和物理合理性之间取得平衡。我们提出了Matcha,一种新颖的分子对接流程,它将多阶段流匹配与学习型评分及物理有效性过滤相结合。我们的方法包含三个依次应用以精修对接预测的连续阶段,每个阶段均实现为在适当几何空间($\mathbb{R}^3$、$\mathrm{SO}(3)$和$\mathrm{SO}(2)$)上操作的流匹配模型。我们通过专用的评分模型提升预测质量,并应用无监督的物理有效性过滤器以消除不现实的构象。与多种方法相比,Matcha在Astex和PDBbind测试集上,在对接成功率和物理合理性方面均展现出优越性能。此外,我们的方法比现代大规模共折叠模型快约25倍。用于复现我们结果的模型权重和推理代码可在 https://github.com/LigandPro/Matcha 获取。