Determining an appropriate sample size for a study is a crucial step in planning scientific research. Appropriate sample size planning avoids both inadequate and inflated sample sizes. Inflated sample sizes wastes resources, time and effort of human subjects, and lives of experimental animals. Inadequate sample sizes, a much more common problem, wastes even more resources through the inability to detect biologically meaningful differences and encourages questionable research practices like $p$-hacking. Microbiome studies are particularly challenged by small sample sizes, particularly in studies of human subjects or expensive animal models. In practice, the statistical power of taxa within a differential abundance study is influenced by the effect size (typically quantified as fold change), mean abundance of individual taxa, and the number of samples. We present a novel approach for sample size calculation for differential abundance studies as a function of effect size, mean abundance and statistical power of taxa. Our method is implemented in the power.nb R package, available at https://michaelagronah.com/power.nb/articles/stub.html. We applied our model for sample size calculation using estimates of mean abundance and fold change of taxa obtained from thirty real-world microbiome datasets. Our results showed that differential abundance microbiome studies require larger sample sizes than are currently prevalent in the literature to achieve adequate statistical power. Our framework will help researchers make informed decisions about appropriate sample sizes.
翻译:确定适当的研究样本量是科研规划中的关键步骤。合理的样本量规划可避免样本量不足或过度膨胀:过度膨胀的样本量会浪费资源、时间、受试者精力及实验动物生命;而样本量不足这一更普遍的问题不仅因无法检测生物学上有意义的差异而浪费更多资源,还会助长$p$-值操纵等可疑研究行为。微生物组研究尤其受小样本量困扰,在人类受试者或昂贵动物模型的研究中尤为突出。实践中,差异丰度研究中各分类群的统计效力受效应量(通常以倍数变化量化)、各分类群平均丰度及样本数量共同影响。我们提出了一种差异丰度研究中样本量估算的新方法,该方法可基于效应量、平均丰度及分类群统计效力进行函数化计算。本方法已通过power.nb R包实现,详见https://michaelagronah.com/power.nb/articles/stub.html。利用三十个真实微生物组数据集获得的分类群平均丰度与倍数变化估计值,我们将模型应用于样本量计算。结果表明:要获得足够的统计效力,差异丰度微生物组研究所需的样本量远大于当前文献普遍采用的规模。本框架将帮助研究人员就合理样本量做出明智决策。