Differential abundance analysis is a key component of microbiome studies. While dozens of methods for it exist, currently, there is no consensus on the preferred methods. Correctness of results in differential abundance analysis is an ambiguous concept that cannot be evaluated without employing simulated data, but we argue that consistency of results across datasets should be considered as an essential quality of a well-performing method. We compared the performance of 14 differential abundance analysis methods employing datasets from 54 taxonomic profiling studies based on 16S rRNA gene or shotgun sequencing. For each method, we examined how the results replicated between random partitions of each dataset and between datasets from independent studies. While certain methods showed good consistency, some widely used methods were observed to produce a substantial number of conflicting findings. Overall, the highest consistency without unnecessary reduction in sensitivity was attained by analyzing relative abundances with a non-parametric method (Wilcoxon test or ordinal regression model) or linear regression (MaAsLin2). Comparable performance was also attained by analyzing presence/absence of taxa with logistic regression.
翻译:差异丰度分析是微生物组研究的关键组成部分。尽管存在数十种相关分析方法,但目前尚未就优选方法达成共识。差异丰度分析结果的正确性是一个模糊概念,若不使用模拟数据则无法评估,但我们主张跨数据集结果的一致性应被视为高性能方法的核心质量特征。我们基于54项采用16S rRNA基因或鸟枪法测序的分类谱研究数据集,比较了14种差异丰度分析方法的性能。针对每种方法,我们检验了其在每个数据集的随机分区之间以及独立研究数据集之间的结果可重复性。虽然某些方法表现出良好的一致性,但观察到部分广泛使用的方法会产生大量相互矛盾的发现。总体而言,通过非参数方法(Wilcoxon检验或序数回归模型)或线性回归(MaAsLin2)分析相对丰度,可在不必要降低敏感性的前提下获得最高一致性。使用逻辑回归分析类群存在/缺失也获得了相当的性能表现。