Structure-based and ligand-based computational drug design have traditionally relied on disjoint data sources and modeling assumptions, limiting their joint use at scale. In this work, we introduce Contrastive Geometric Learning for Unified Computational Drug Design (ConGLUDe), a single contrastive geometric model that unifies structure- and ligand-based training. ConGLUDe couples a geometric protein encoder that produces whole-protein representations and implicit embeddings of predicted binding sites with a fast ligand encoder, removing the need for pre-defined pockets. By aligning ligands with both global protein representations and multiple candidate binding sites through contrastive learning, ConGLUDe supports ligand-conditioned pocket prediction in addition to virtual screening and target fishing, while being trained jointly on protein-ligand complexes and large-scale bioactivity data. Across diverse benchmarks, ConGLUDe achieves state-of-the-art zero-shot virtual screening performance in settings where no binding pocket information is provided as input, substantially outperforms existing methods on a challenging target fishing task, and demonstrates competitive ligand-conditioned pocket selection. These results highlight the advantages of unified structure-ligand training and position ConGLUDe as a step toward general-purpose foundation models for drug discovery.
翻译:结构基与配体基计算药物设计传统上依赖分离的数据源和建模假设,限制了其大规模联合应用。本研究提出统一计算药物设计的对比几何学习模型(ConGLUDe),这是一个通过对比几何学习统一结构与配体基训练的单一模型。ConGLUDe耦合了一个生成全蛋白表征及预测结合位点隐式嵌入的几何蛋白质编码器与一个快速配体编码器,无需预定义结合口袋。通过对比学习将配体与全局蛋白质表征及多个候选结合位点对齐,ConGLUDe除支持虚拟筛选和靶标垂钓外,还能实现配体条件化的口袋预测,同时可联合训练于蛋白质-配体复合物和大规模生物活性数据。在多样化基准测试中,ConGLUDe在未提供结合口袋信息的场景下实现了零样本虚拟筛选的最先进性能,在具有挑战性的靶标垂钓任务上显著超越现有方法,并展现出具有竞争力的配体条件化口袋选择能力。这些结果凸显了统一结构-配体训练的优势,并将ConGLUDe定位为迈向通用药物发现基础模型的重要一步。