基于边缘感知图注意力网络的蛋白质结合位点预测 (Edge-aware GAT-based protein binding site prediction)

Accurate identification of protein binding sites is crucial for understanding biomolecular interaction mechanisms and for the rational design of drug targets. Traditional predictive methods often struggle to balance prediction accuracy with computational efficiency when capturing complex spatial conformations. To address this challenge, we propose an Edge-aware Graph Attention Network (Edge-aware GAT) model for the fine-grained prediction of binding sites across various biomolecules, including proteins, DNA/RNA, ions, ligands, and lipids. Our method constructs atom-level graphs and integrates multidimensional structural features, including geometric descriptors, DSSP-derived secondary structure, and relative solvent accessibility (RSA), to generate spatially aware embedding vectors. By incorporating interatomic distances and directional vectors as edge features within the attention mechanism, the model significantly enhances its representation capacity. On benchmark datasets, our model achieves an ROC-AUC of 0.93 for protein-protein binding site prediction, outperforming several state-of-the-art methods. The use of directional tensor propagation and residue-level attention pooling further improves both binding site localization and the capture of local structural details. Visualizations using PyMOL confirm the model's practical utility and interpretability. To facilitate community access and application, we have deployed a publicly accessible web server at http://119.45.201.89:5000/. In summary, our approach offers a novel and efficient solution that balances prediction accuracy, generalization, and interpretability for identifying functional sites in proteins.

翻译：准确识别蛋白质结合位点对于理解生物分子相互作用机制及合理设计药物靶点至关重要。传统预测方法在捕捉复杂空间构象时往往难以平衡预测精度与计算效率。为应对这一挑战，我们提出了一种基于边缘感知图注意力网络（Edge-aware GAT）的模型，用于对包括蛋白质、DNA/RNA、离子、配体和脂质在内的多种生物分子进行精细化的结合位点预测。该方法构建原子级图结构，并整合几何描述符、DSSP衍生的二级结构及相对溶剂可及性（RSA）等多维结构特征，以生成具有空间感知能力的嵌入向量。通过将原子间距离和方向向量作为注意力机制中的边特征，模型显著提升了其表征能力。在基准数据集上，本模型在蛋白质-蛋白质结合位点预测任务中取得了0.93的ROC-AUC值，优于多种现有先进方法。方向张量传播与残基级注意力池化技术的运用进一步提升了结合位点定位能力及局部结构细节的捕捉效果。基于PyMOL的可视化结果证实了该模型的实际效用与可解释性。为促进学术界的访问与应用，我们已在http://119.45.201.89:5000/部署了公开可访问的网页服务器。综上所述，本研究为识别蛋白质功能位点提供了一种新颖高效的解决方案，在预测精度、泛化能力与可解释性之间取得了良好平衡。