While there has been significant progress in evaluating and comparing different representations for learning on protein data, the role of surface-based learning approaches remains not well-understood. In particular, there is a lack of direct and fair benchmark comparison between the best available surface-based learning methods against alternative representations such as graphs. Moreover, the few existing surface-based approaches either use surface information in isolation or, at best, perform global pooling between surface and graph-based architectures. In this work, we fill this gap by first adapting a state-of-the-art surface encoder for protein learning tasks. We then perform a direct and fair comparison of the resulting method against alternative approaches within the Atom3D benchmark, highlighting the limitations of pure surface-based learning. Finally, we propose an integrated approach, which allows learned feature sharing between graphs and surface representations on the level of nodes and vertices $\textit{across all layers}$. We demonstrate that the resulting architecture achieves state-of-the-art results on all tasks in the Atom3D benchmark, while adhering to the strict benchmark protocol, as well as more broadly on binding site identification and binding pocket classification. Furthermore, we use coarsened surfaces and optimize our approach for efficiency, making our tool competitive in training and inference time with existing techniques. Our code and data can be found online: $\texttt{github.com/Vincentx15/atomsurf}$
翻译:尽管在评估和比较蛋白质数据学习的不同表示方面已取得显著进展,但基于表面的学习方法的作用仍未得到充分理解。特别是,目前缺乏现有最佳表面学习方法与图表示等替代方法之间直接且公平的基准比较。此外,现有的少数表面学习方法要么单独使用表面信息,要么至多在表面与基于图的架构之间进行全局池化。在本工作中,我们首先通过适配最先进的表面编码器用于蛋白质学习任务来填补这一空白。随后,我们在Atom3D基准测试中对所得方法与其他替代方法进行了直接公平的比较,揭示了纯表面学习方法的局限性。最后,我们提出一种集成方法,该方法允许在节点与顶点层面$\textit{跨所有层}$实现图表示与表面表示之间的学习特征共享。我们证明,所得架构在遵循严格基准协议的前提下,在Atom3D基准测试的所有任务中均达到最先进的性能水平,并在结合位点识别与结合口袋分类等更广泛任务中表现优异。此外,我们采用粗粒度表面表示并优化方法效率,使我们的工具在训练与推理时间上与现有技术相比具有竞争力。我们的代码与数据可在以下网址获取:$\texttt{github.com/Vincentx15/atomsurf}$