In recent years, the scientific community has become increasingly interested on peptides with non-canonical amino acids due to their superior stability and resistance to proteolytic degradation. These peptides present promising modifications to biological, pharmacological, and physiochemical attributes in both endogenous and engineered peptides. Notwithstanding their considerable advantages, the scientific community exhibits a conspicuous absence of an effective pre-trained model adept at distilling feature representations from such complex peptide sequences. We herein propose PepLand, a novel pre-training architecture for representation and property analysis of peptides spanning both canonical and non-canonical amino acids. In essence, PepLand leverages a comprehensive multi-view heterogeneous graph neural network tailored to unveil the subtle structural representations of peptides. Empirical validations underscore PepLand's effectiveness across an array of peptide property predictions, encompassing protein-protein interactions, permeability, solubility, and synthesizability. The rigorous evaluation confirms PepLand's unparalleled capability in capturing salient synthetic peptide features, thereby laying a robust foundation for transformative advances in peptide-centric research domains. We have made all the source code utilized in this study publicly accessible via GitHub at https://github.com/zhangruochi/pepland
翻译:近年来,科学界对含非规范氨基酸的多肽日益关注,因其具有更优的稳定性和抗蛋白酶水解能力。这些多肽在内在及工程化多肽中,对生物学、药理学及物理化学属性展现出极具前景的修饰效果。尽管优势显著,科学界仍缺乏能够从这类复杂肽序列中有效提取特征表示的预训练模型。本文提出PepLand——一种全新的预训练架构,专为涵盖规范与非规范氨基酸的多肽表示与性质分析而设计。本质上,PepLand利用多视图异构图神经网络精准揭示多肽的细微结构表示。实证验证表明,PepLand在蛋白质-蛋白质相互作用、渗透性、溶解性及可合成性等多肽性质预测任务中均表现优异。严格评估证实了PepLand在捕获关键合成肽特征方面的卓越能力,从而为以多肽为核心的研究领域奠定变革性基础。本研究所有源代码已通过GitHub(https://github.com/zhangruochi/pepland)公开发布。