Differential abundance analysis is a key component of microbiome studies. Although dozens of methods exist there is currently no consensus on the preferred methods. While the correctness of results in differential abundance analysis is an ambiguous concept and cannot be fully evaluated without setting the ground truth and employing simulated data, we argue that a well-performing method should be effective in producing highly reproducible results. We compared the performance of 14 differential abundance analysis methods by employing datasets from 53 taxonomic profiling studies based on 16S rRNA gene or shotgun metagenomic sequencing. For each method, we examined how the results replicated between random partitions of each dataset and between datasets from separate studies. While certain methods showed good consistency, some widely used methods were observed to produce a substantial number of conflicting findings. Overall, when considering consistency together with sensitivity, the best performance was attained by analyzing relative abundances with a non-parametric method (Wilcoxon test or ordinal regression model) or linear regression/t-test. Moreover, a comparable performance was obtained by analyzing presence/absence of taxa with logistic regression.
翻译:差异丰度分析是微生物组研究的关键组成部分。尽管存在数十种分析方法,目前对于优选方法尚未形成共识。虽然差异丰度分析结果的正确性是一个模糊概念,若不设定真实基准并采用模拟数据则无法完全评估,但我们主张性能良好的方法应能有效产生高度可重复的结果。通过使用来自53项基于16S rRNA基因或鸟枪法宏基因组测序的分类谱研究数据集,我们比较了14种差异丰度分析方法的性能。针对每种方法,我们检验了结果在数据集随机分区之间以及独立研究数据集之间的可重复性。虽然某些方法表现出良好的一致性,但观察到部分广泛使用的方法会产生大量相互矛盾的发现。总体而言,在综合考虑一致性与灵敏度时,最佳性能是通过非参数方法(Wilcoxon检验或序数回归模型)或线性回归/t检验分析相对丰度实现的。此外,通过逻辑回归分析类群存在/缺失也获得了相当的性能。