Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information from WGS data and reference metabolites to impute unknown metabolites. Our approach utilizes a multi-view variational autoencoder to jointly model the burden score, polygenetic risk score (PGS), and linkage disequilibrium (LD) pruned single nucleotide polymorphisms (SNPs) for feature extraction and missing metabolomics data imputation. By learning the latent representations of both omics data, our method can effectively impute missing metabolomics values based on genomic information. Results: We evaluate the performance of our method on empirical metabolomics datasets with missing values and demonstrate its superiority compared to conventional imputation techniques. Using 35 template metabolites derived burden scores, PGS and LD-pruned SNPs, the proposed methods achieved r2-scores > 0.01 for 71.55% of metabolites. Conclusion: The integration of WGS data in metabolomics imputation not only improves data completeness but also enhances downstream analyses, paving the way for more comprehensive and accurate investigations of metabolic pathways and disease associations. Our findings offer valuable insights into the potential benefits of utilizing WGS data for metabolomics data imputation and underscore the importance of leveraging multi-modal data integration in precision medicine research.
翻译:背景:质谱代谢组学中缺失数据是常见挑战,会导致分析结果出现偏差且不完整。全基因组测序(WGS)数据与代谢组学数据的整合,已成为提升代谢组学研究数据插补准确性的有前景方法。方法:本研究提出一种新方法,利用WGS数据与参考代谢物信息对未知代谢物进行插补。该方法采用多视角变分自编码器,联合建模负担评分、多基因风险评分(PGS)及连锁不平衡(LD)剪枝后的单核苷酸多态性(SNPs),实现特征提取与代谢组学缺失数据插补。通过学习两种组学数据的潜在表征,该方法能基于基因组信息有效插补缺失的代谢组学值。结果:我们在含缺失值的真实代谢组学数据集上评估了方法性能,证明其优于传统插补技术。基于35种模板代谢物衍生的负担评分、PGS及LD剪枝的SNPs,所提方法对71.55%的代谢物取得r²评分>0.01。结论:将WGS数据整合至代谢组学插补不仅提升数据完整性,还增强下游分析效果,为更全面准确地研究代谢通路与疾病关联奠定基础。本研究揭示了WGS数据用于代谢组学插补的潜在价值,并强调多模态数据整合在精准医学研究中的重要性。