Biological sequencing data consist of read counts, e.g. of specified taxa and often exhibit sparsity (zero-count inflation) and overdispersion (extra-Poisson variability). As most sequencing techniques provide an arbitrary total count, taxon-specific counts should ideally be treated as proportions under the compositional data-analytic framework. There is increasing interest in the role of the gut microbiome composition in mediating the effects of different exposures on health outcomes. Most previous approaches to compositional mediation have addressed the problem of identifying potentially mediating taxa among a large number of candidates. We here consider causal inference in compositional mediation when a priori knowledge is available about the hierarchy for a restricted number of taxa, building on a single hypothesis structured in terms of contrasts between appropriate sub-compositions. Based on the theory on multiple contemporaneous mediators and the assumed causal graph, we define non-parametric estimands for overall and coordinate-wise mediation effects, and show how these indirect effects can be estimated from empirical data based on simple parametric linear models. The mediators have straightforward and coherent interpretations, related to specific causal questions about the interrelationships between the sub-compositions. We perform a simulation study focusing on the impact of sparsity and overdispersion on estimation of mediation. While unbiased, the precision of the estimators depends, for any given magnitude of indirect effect, on sparsity and the relative magnitudes of exposure-to-mediator and mediator-to-outcome effects in a complex manner. We demonstrate the approach on empirical data, finding an inverse association of fibre intake on insulin level, mainly attributable to direct rather than indirect effects.
翻译:生物测序数据包含如特定分类单元序列计数,常呈现稀疏性(零计数膨胀)和过度离散性(额外泊松变异)。由于多数测序技术提供任意总计数,理想情况下应将分类单元特异性计数视为组成数据分析框架下的比例。肠道微生物组组成在介导不同暴露因素对健康结局影响中的作用日益受到关注。以往大多数组成中介分析方法主要解决从大量候选分类单元中识别潜在中介分类单元的问题。本文在预先了解有限分类单元层级结构的前提下,基于以适当子组成之间对比形式构建的单一假设,探讨组成中介的因果推断。依据多重同期中介理论及假设因果图,我们定义了整体和坐标维度中介效应的非参数估计量,并展示了如何基于简单参数线性模型从经验数据中估计这些间接效应。这些中介变量具有直接且一致的解释,与子组之间相互关系的特定因果问题相关联。我们通过模拟研究聚焦稀疏性和过度离散性对中介效应估计的影响。尽管估计量无偏,但对于任意给定间接效应量级,其精度复杂地取决于稀疏性以及暴露-中介效应与中介-结局效应的相对量级。我们将该方法应用于经验数据,发现纤维摄入与胰岛素水平的负相关主要归因于直接效应而非间接效应。