Machine learning approaches to Structure-Based Drug Design (SBDD) have proven quite fertile over the last few years. In particular, diffusion-based approaches to SBDD have shown great promise. We present a technique which expands on this diffusion approach in two crucial ways. First, we address the size disparity between the drug molecule and the target/receptor, which makes learning more challenging and inference slower. We do so through the notion of a Virtual Receptor, which is a compressed version of the receptor; it is learned so as to preserve key aspects of the structural information of the original receptor, while respecting the relevant group equivariance. Second, we incorporate a protein language embedding used originally in the context of protein folding. We experimentally demonstrate the contributions of both the virtual receptors and the protein embeddings: in practice, they lead to both better performance, as well as significantly faster computations.
翻译:近年来,基于机器学习的结构药物设计方法已展现出显著成效。其中,基于扩散的SBDD方法尤其表现出巨大潜力。本研究提出一种技术,从两个关键维度拓展了该扩散方法。首先,我们针对药物分子与靶点/受体间的尺寸差异问题(该差异会增加学习难度并降低推理速度)提出了虚拟受体概念。虚拟受体是受体的压缩表示,其学习目标是在保持原始受体结构信息关键特征的同时,遵循相关的群等变性原理。其次,我们引入了最初用于蛋白质折叠任务的蛋白质语言嵌入技术。实验结果表明,虚拟受体与蛋白质嵌入均能带来实质性改进:在实际应用中,二者不仅提升了模型性能,还显著加快了计算速度。