Liquid chromatography mass spectrometry (LC-MS)-based metabolomics and exposomics aim to measure detectable small molecules in biological samples. The results facilitate hypothesis-generating discovery of metabolic changes and disease mechanisms and provide information about environmental exposures and their effects on human health. Metabolomics and exposomics are made possible by the high resolving power of LC and high mass measurement accuracy of MS. However, a majority of the signals from such studies still cannot be identified or annotated using conventional library searching because existing spectral libraries are far from covering the vast chemical space captured by LC-MS/MS. To address this challenge and unleash the full potential of metabolomics and exposomics, a number of computational approaches have been developed to predict compounds based on tandem mass spectra. Published assessment of these approaches used different datasets and evaluation. To select prediction workflows for practical applications and identify areas for further improvements, we have carried out a systematic evaluation of the state-of-the-art prediction algorithms. Specifically, the accuracy of formula prediction and structure prediction was evaluated for different types of adducts. The resulting findings have established realistic performance baselines, identified critical bottlenecks, and provided guidance to further improve compound predictions based on MS.
翻译:基于液相色谱-质谱联用技术(LC-MS)的代谢组学与暴露组学旨在检测生物样本中可探测的小分子化合物。该技术成果有助于通过假设生成发现代谢变化与疾病机制,并提供环境暴露及其对人体健康影响的信息。代谢组学与暴露组学的实现得益于液相色谱的高分离能力与质谱的高质量测量精度。然而,由于现有谱图库远未覆盖LC-MS/MS所捕获的庞大化学空间,此类研究中大部分信号仍无法通过传统谱库检索进行鉴定或注释。为应对这一挑战并充分释放代谢组学与暴露组学的潜力,学界已开发出多种基于串联质谱预测化合物的计算方法。现有评估研究采用不同数据集与评价标准,为筛选实际应用中的预测流程并明确改进方向,我们对前沿预测算法进行了系统评估。具体而言,本研究针对不同加合物类型评估了分子式预测与结构预测的准确性。研究结果确立了实际性能基准,识别出关键瓶颈,并为进一步提升基于质谱的化合物预测提供了指导。