Predicting the binding sites of the target proteins plays a fundamental role in drug discovery. Most existing deep-learning methods consider a protein as a 3D image by spatially clustering its atoms into voxels and then feed the voxelized protein into a 3D CNN for prediction. However, the CNN-based methods encounter several critical issues: 1) defective in representing irregular protein structures; 2) sensitive to rotations; 3) insufficient to characterize the protein surface; 4) unaware of data distribution shift. To address the above issues, this work proposes EquiPocket, an E(3)-equivariant Graph Neural Network (GNN) for binding site prediction. In particular, EquiPocket consists of three modules: the first one to extract local geometric information for each surface atom, the second one to model both the chemical and spatial structure of the protein, and the last one to capture the geometry of the surface via equivariant message passing over the surface atoms. We further propose a dense attention output layer to better alleviate the data distribution shift effect incurred by the variable protein size. Extensive experiments on several representative benchmarks demonstrate the superiority of our framework to the state-of-the-art methods.
翻译:预测靶标蛋白的结合位点在药物发现中起着基础性作用。现有大多数深度学习方法将蛋白质视为三维图像,通过将其原子空间聚类为体素,然后将体素化后的蛋白质输入三维卷积神经网络进行预测。然而,基于卷积神经网络的方法存在若干关键问题:1) 对不规则蛋白质结构的表征存在缺陷;2) 对旋转敏感;3) 难以充分描述蛋白质表面特征;4) 未能感知数据分布偏移。针对上述问题,本文提出EquiPocket——一种E(3)-等变图神经网络用于结合位点预测。具体而言,EquiPocket包含三个模块:第一个模块提取每个表面原子的局部几何信息,第二个模块同时建模蛋白质的化学结构和空间结构,最后一个模块通过表面原子间的等变消息传递捕捉表面几何形状。我们进一步提出密集注意力输出层,以更好地缓解由蛋白质大小变化引起的数据分布偏移效应。在多个代表性基准数据集上的大量实验表明,我们的框架优于现有最先进方法。