Ongoing advances in microbiome profiling have allowed unprecedented insights into the molecular activities of microbial communities. This has fueled a strong scientific interest in understanding the critical role the microbiome plays in governing human health, by identifying microbial features associated with clinical outcomes of interest. Several aspects of microbiome data limit the applicability of existing variable selection approaches. In particular, microbiome data are high-dimensional, extremely sparse, and compositional. Importantly, many of the observed features, although categorized as different taxa, may play related functional roles. To address these challenges, we propose a novel compositional regression approach that leverages the data-adaptive clustering and variable selection properties of the spiked Dirichlet process to identify taxa that exhibit similar functional roles. Our proposed method, Bayesian Regression with Agglomerated Compositional Effects using a dirichLET process (BRACElet), enables the identification of a sparse set of features with shared impacts on the outcome, facilitating dimension reduction and model interpretation. We demonstrate that BRACElet outperforms existing approaches for microbiome variable selection through simulation studies and an application elucidating the impact of oral microbiome composition on insulin resistance.
翻译:微生物组分析技术的持续进步使得我们能够以前所未有的方式洞察微生物群落的分子活动。这极大地激发了科学界对于理解微生物组在调控人类健康中关键作用的兴趣,具体表现为识别与目标临床结局相关的微生物特征。微生物组数据的若干特性限制了现有变量选择方法的适用性。特别是,微生物组数据具有高维度、极端稀疏性和组合性特征。值得注意的是,许多观测到的特征虽然被归类为不同分类单元,却可能发挥相似的功能作用。为应对这些挑战,我们提出一种新颖的组合回归方法,该方法利用尖峰狄利克雷过程的数据自适应聚类与变量选择特性,以识别具有相似功能作用的分类单元。我们提出的方法——基于狄利克雷过程的聚合组合效应贝叶斯回归(BRACElet),能够识别出一组对结局具有共同影响的稀疏特征集,从而实现降维并提升模型可解释性。通过模拟研究及一项阐明口腔微生物组组成对胰岛素抵抗影响的应用案例,我们证明BRACElet在微生物组变量选择方面优于现有方法。