Advances in next-generation sequencing technology have enabled the high-throughput profiling of metagenomes and accelerated the microbiome study. Recently, there has been a rise in quantitative studies that aim to decipher the microbiome co-occurrence network and its underlying community structure based on metagenomic sequence data. Uncovering the complex microbiome community structure is essential to understanding the role of the microbiome in disease progression and susceptibility. Taxonomic abundance data generated from metagenomic sequencing technologies are high-dimensional and compositional, suffering from uneven sampling depth, over-dispersion, and zero-inflation. These characteristics often challenge the reliability of the current methods for microbiome community detection. To this end, we propose a Bayesian stochastic block model to study the microbiome co-occurrence network based on the recently developed modified centered-log ratio transformation tailored for microbiome data analysis. Our model allows us to incorporate taxonomic tree information using a Markov random field prior. The model parameters are jointly inferred by using Markov chain Monte Carlo sampling techniques. Our simulation study showed that the proposed approach performs better than competing methods even when taxonomic tree information is non-informative. We applied our approach to a real urinary microbiome dataset from postmenopausal women, the first time the urinary microbiome co-occurrence network structure has been studied. In summary, this statistical methodology provides a new tool for facilitating advanced microbiome studies.
翻译:新一代测序技术的进步使得元基因组的通量分析成为可能,并加速了微生物组研究的发展。近年来,基于元基因组序列数据解读微生物共现网络及其潜在群落结构的定量研究日益增多。揭示复杂的微生物群落结构对于理解微生物组在疾病进展和易感性中的作用至关重要。由元基因组测序技术生成的分类丰度数据具有高维性和成分性,同时存在采样深度不均、过度离散及零膨胀问题。这些特征常对现有微生物群落检测方法的可靠性构成挑战。为此,我们提出一种基于近期针对微生物组数据开发改良中心化对数比变换的贝叶斯随机块模型,用于研究微生物共现网络。该模型通过马尔可夫随机场先验整合分类树信息,并利用马尔可夫链蒙特卡洛采样技术联合推断模型参数。仿真研究表明,即使分类树信息非信息性时,所提方法的性能仍优于竞争方法。我们将该方法应用于绝经后女性真实泌尿微生物组数据集,这是首次对泌尿微生物组共现网络结构进行研究。总之,该统计方法为推进微生物组高级研究提供了新工具。