Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry

from arxiv, Ebrahimi S., Guo X. Transformer-based de novo peptide sequencing for data-independent acquisition mass spectrometry. In 2023 IEEE 23rd International Conference on Bioinformatics and Bioengineering (BIBE) 2022 Dec 6 (pp. 17-22). IEEE

Tandem mass spectrometry (MS/MS) stands as the predominant high-throughput technique for comprehensively analyzing protein content within biological samples. This methodology is a cornerstone driving the advancement of proteomics. In recent years, substantial strides have been made in Data-Independent Acquisition (DIA) strategies, facilitating impartial and non-targeted fragmentation of precursor ions. The DIA-generated MS/MS spectra present a formidable obstacle due to their inherent high multiplexing nature. Each spectrum encapsulates fragmented product ions originating from multiple precursor peptides. This intricacy poses a particularly acute challenge in de novo peptide/protein sequencing, where current methods are ill-equipped to address the multiplexing conundrum. In this paper, we introduce DiaTrans, a deep-learning model based on transformer architecture. It deciphers peptide sequences from DIA mass spectrometry data. Our results show significant improvements over existing STOA methods, including DeepNovo-DIA and PepNet. Casanovo-DIA enhances precision by 15.14% to 34.8%, recall by 11.62% to 31.94% at the amino acid level, and boosts precision by 59% to 81.36% at the peptide level. Integrating DIA data and our DiaTrans model holds considerable promise to uncover novel peptides and more comprehensive profiling of biological samples. Casanovo-DIA is freely available under the GNU GPL license at https://github.com/Biocomputing-Research-Group/DiaTrans.

翻译：串联质谱（MS/MS）是全面分析生物样品中蛋白质含量的主流高通量技术，该方法是推动蛋白质组学发展的基石。近年来，数据非依赖采集（DIA）策略取得了重大进展，实现了前体离子的公正且非靶向的碎裂。DIA生成的MS/MS谱图因其固有的高度多重性而构成巨大挑战——每个谱图均包含源自多个前体肽段的碎片产物离子。这种复杂性对从头肽段/蛋白质测序提出了尤为严峻的挑战，现有方法难以有效应对这一多重性难题。本文提出DiaTrans，一种基于Transformer架构的深度学习模型，用于从DIA质谱数据中解析肽段序列。我们的结果表明，相较于现有最先进方法（包括DeepNovo-DIA和PepNet），Casanovo-DIA在氨基酸水平上将精确度提升了15.14%至34.8%，召回率提升了11.62%至31.94%；在肽段水平上将精确度提升了59%至81.36%。整合DIA数据与我们的DiaTrans模型，有望发现新型肽段并实现更全面的生物样品分析。Casanovo-DIA依据GNU GPL许可证发布于https://github.com/Biocomputing-Research-Group/DiaTrans，可免费获取。

相关内容

MASS

关注 0

MASS：IEEE International Conference on Mobile Ad-hoc and Sensor Systems。 Explanation：移动Ad hoc和传感器系统IEEE国际会议。 Publisher：IEEE。 SIT： http://dblp.uni-trier.de/db/conf/mass/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日