Elementary methods provide more replicable results in microbial differential abundance analysis

Differential abundance analysis is a key component of microbiome studies. It focuses on the task of assessing the magnitude and statistical significance of differences in microbial abundances between conditions. While dozens of methods for differential abundance analysis exist, they have been reported to produce remarkably discordant results. Currently, there is no consensus on the preferred methods. While correctness of results in differential abundance analysis is an ambiguous concept that cannot be evaluated without employing simulated data, we argue that consistency of results across datasets should be considered as an essential quality of a well-performing method. We compared the performance of 13 differential abundance analysis methods employing datasets from multiple (N = 54) taxonomic profiling studies based on 16S rRNA gene or shotgun sequencing. For each method, we examined how the results replicated between random partitions of each dataset and between datasets from independent studies. While certain methods showed good consistency, some widely used methods were observed to make a substantial number of conflicting findings. Overall, the highest consistency without unnecessary reduction in sensitivity was attained by analyzing total sum scaling (TSS) normalized counts with a non-parametric method (Wilcoxon test or ordinal regression model) or linear regression (MaAsLin2). Comparable performance was also attained by analyzing presence/absence of taxa with logistic regression. In conclusion, while numerous sophisticated methods for differential abundance analysis have been developed, elementary methods seem to provide more consistent results without unnecessarily compromising sensitivity. We therefore suggest that the elementary methods should be preferred in microbial differential abundance analysis when replicability needs to be emphasized.

翻译：差异丰度分析是微生物组研究的关键组成部分。其核心任务是评估不同条件下微生物丰度差异的幅度及统计显著性。尽管已有数十种差异丰度分析方法，但研究表明这些方法的结果存在显著不一致性。目前，学界尚未就优先采用何种方法达成共识。虽然差异丰度分析结果的正确性是一个模糊概念，且只能通过模拟数据进行评估，但我们认为跨数据集的结果一致性应被视为衡量方法性能的重要指标。我们采用来自多个（N=54）基于16S rRNA基因或鸟枪法测序的分类学谱系研究数据集，比较了13种差异丰度分析方法的性能。针对每种方法，我们分别考察了其在各数据集的随机子集间以及独立研究数据集间的结果可重复性。结果显示，部分方法表现出良好的一致性，而某些广泛使用的方法则存在大量相互矛盾的发现。总体而言，通过对总和缩放（TSS）归一化计数采用非参数方法（Wilcoxon检验或序数回归模型）或线性回归（MaAsLin2），可在不必要降低灵敏度的情况下获得最高的一致性。采用逻辑回归分析物种存在/缺失也获得了相当的性能。结论表明，尽管已开发出大量复杂的差异丰度分析方法，但基础方法似乎能在不必要牺牲灵敏度的情况下提供更一致的结果。因此，我们建议在需要强调可重复性的微生物差异丰度分析中优先采用基础方法。