Microbiome omics data including 16S rRNA reveal intriguing dynamic associations between the human microbiome and various disease states. Drastic changes in microbiota can be associated with factors like diet, hormonal cycles, diseases, and medical interventions. Along with the identification of specific bacteria taxa associated with diseases, recent advancements give evidence that metabolism, genetics, and environmental factors can model these microbial effects. However, the current analytic methods for integrating microbiome data are fully developed to address the main challenges of longitudinal metagenomics data, such as high-dimensionality, intra-sample dependence, and zero-inflation of observed counts. Hence, we propose the Bayes factor approach for model selection based on negative binomial, Poisson, zero-inflated negative binomial, and zero-inflated Poisson models with non-informative Jeffreys prior. We find that both in simulation studies and real data analysis, our Bayes factor remarkably outperform traditional Akaike information criterion and Vuong's test. A new R package BFZINBZIP has been introduced to do simulation study and real data analysis to facilitate Bayesian model selection based on the Bayes factor.
翻译:微生物组组学数据(包括16S rRNA)揭示了人类微生物组与多种疾病状态之间有趣的动态关联。微生物群的剧烈变化可能与饮食、激素周期、疾病和医疗干预等因素相关。除了识别与疾病相关的特定菌群分类单元外,近期进展表明代谢、遗传和环境因素可模拟这些微生物效应。然而,当前用于整合微生物组数据的分析方法已充分发展以解决纵向宏基因组数据的主要挑战,如高维性、样本内依赖性以及观测计数的零膨胀问题。因此,我们提出了基于非信息性杰弗里斯先验的负二项、泊松、零膨胀负二项及零膨胀泊松模型的贝叶斯因子模型选择方法。在模拟研究和真实数据分析中,我们发现我们的贝叶斯因子显著优于传统的赤池信息准则和Vuong检验。我们开发了新的R软件包BFZINBZIP,用于执行模拟研究和真实数据分析,以促进基于贝叶斯因子的贝叶斯模型选择。