Ongoing advances in microbiome profiling have allowed unprecedented insights into the molecular activities of microbial communities. This has fueled a strong scientific interest in understanding the critical role the microbiome plays in governing human health, by identifying microbial features associated with clinical outcomes of interest. Several aspects of microbiome data limit the applicability of existing variable selection approaches. In particular, microbiome data are high-dimensional, extremely sparse, and compositional. Importantly, many of the observed features, although categorized as different taxa, may play related functional roles. To address these challenges, we propose a novel compositional regression approach that leverages the data-adaptive clustering and variable selection properties of the spiked Dirichlet process to identify taxa that exhibit similar functional roles. Our proposed method, Bayesian Regression with Agglomerated Compositional Effects using a dirichLET process (BRACElet), enables the identification of a sparse set of features with shared impacts on the outcome, facilitating dimension reduction and model interpretation. We demonstrate that BRACElet outperforms existing approaches for microbiome variable selection through simulation studies and an application elucidating the impact of oral microbiome composition on insulin resistance.
翻译:微生物组分析技术的持续进步,使得对微生物群落分子活动的洞察达到了前所未有的深度。这极大地激发了科学界对理解微生物组在调控人类健康中关键作用的兴趣,即通过识别与临床目标结果相关的微生物特征。微生物组数据的若干特性限制了现有变量选择方法的适用性:其数据具有高维度、极端稀疏性和组合性。尤为重要的是,许多观测到的特征虽被归类为不同分类单元,却可能发挥相关的功能作用。为应对这些挑战,我们提出了一种新颖的组合回归方法,该方法利用尖峰狄利克雷过程的数据自适应聚类和变量选择特性,以识别表现出相似功能作用的分类单元。我们提出的方法——基于狄利克雷过程的聚合组合效应贝叶斯回归(BRACElet),能够识别出一组对结果具有共同影响的稀疏特征集,从而促进维度缩减和模型解释。通过模拟研究以及一项阐明口腔微生物组组成对胰岛素抵抗影响的应用,我们证明BRACElet在微生物组变量选择方面优于现有方法。