Deep learning has become a crucial tool in studying proteins. While the significance of modeling protein structure has been discussed extensively in the literature, amino acid types are typically included in the input as a default operation for many inference tasks. This study demonstrates with structure alignment task that embedding amino acid types in some cases may not help a deep learning model learn better representation. To this end, we propose ProtLOCA, a local geometry alignment method based solely on amino acid structure representation. The effectiveness of ProtLOCA is examined by a global structure-matching task on protein pairs with an independent test dataset based on CATH labels. Our method outperforms existing sequence- and structure-based representation learning methods by more quickly and accurately matching structurally consistent protein domains. Furthermore, in local structure pairing tasks, ProtLOCA for the first time provides a valid solution to highlight common local structures among proteins with different overall structures but the same function. This suggests a new possibility for using deep learning methods to analyze protein structure to infer function.
翻译:深度学习已成为研究蛋白质的关键工具。尽管文献中已广泛讨论了蛋白质结构建模的重要性,但在许多推理任务中,氨基酸类型通常作为默认操作包含在输入中。本研究通过结构比对任务证明,在某些情况下嵌入氨基酸类型可能无助于深度学习模型学习更好的表示。为此,我们提出了ProtLOCA,一种仅基于氨基酸结构表示的局部几何对齐方法。通过在基于CATH标签的独立测试数据集上对蛋白质对进行全局结构匹配任务,我们检验了ProtLOCA的有效性。我们的方法比现有的基于序列和结构的表示学习方法更快、更准确地匹配结构一致的蛋白质结构域。此外,在局部结构配对任务中,ProtLOCA首次提供了一种有效的解决方案,能够突出具有不同整体结构但功能相同的蛋白质之间的共同局部结构。这为使用深度学习方法分析蛋白质结构以推断功能提供了新的可能性。