Redundant multiple testing corrections: The fallacy of using family-based error rates to make inferences about individual hypotheses

During multiple testing, researchers often adjust their alpha level to control the familywise error rate for a statistical inference about a joint union alternative hypothesis (e.g., "H1 or H2"). However, in some cases, they do not make this inference. Instead, they make separate inferences about each of the individual hypotheses that comprise the joint hypothesis (e.g., H1 and H2). For example, a researcher might use a Bonferroni correction to adjust their alpha level from the conventional level of 0.050 to 0.025 when testing H1 and H2, find a significant result for H1 (p < 0.025) and not for H2 (p > .0.025), and so claim support for H1 and not for H2. However, these separate individual inferences do not require an alpha adjustment. Only a statistical inference about the union alternative hypothesis "H1 or H2" requires an alpha adjustment because it is based on "at least one" significant result among the two tests, and so it depends on the familywise error rate. When a researcher corrects their alpha level during multiple testing but does not make an inference about the union alternative hypothesis, their correction is redundant. In the present article, I discuss this redundant correction problem, including its reduction in statistical power for tests of individual hypotheses and its potential causes vis-\`a-vis error rate confusions and the alpha adjustment ritual. I also provide three illustrations of redundant corrections from recent psychology studies. I conclude that redundant corrections represent a symptom of statisticism, and I call for a more nuanced inference-based approach to multiple testing corrections.

翻译：多重检验中，研究者常调整其 alpha 水平，以控制对联合备择假设（如"H1或H2"）进行统计推断时的族系错误率。然而，在某些情况下，他们并非做出此推断，而是对构成联合假设的各个单独假设（如 H1 和 H2）分别进行推断。例如，研究者在检验 H1 和 H2 时，可能使用邦费罗尼校正将 alpha 水平从常规的 0.050 调整为 0.025，发现 H1 有显著结果（p < 0.025）而 H2 不显著（p > 0.025），从而声称支持 H1 而非 H2。然而，这些单独的个体推断并不需要 alpha 调整。只有关于联合备择假设"H1或H2"的统计推断需要 alpha 调整，因为它基于两次检验中"至少一个"显著结果，并因此依赖于族系错误率。当研究者在多重检验中校正其 alpha 水平，却未对联合备择假设进行推断时，其校正便是冗余的。本文讨论了这一冗余校正问题，包括其降低单个假设检验统计功效的情况，以及其可能原因（涉及错误率混淆和 alpha 校正惯例）。此外，我提供了近期心理学研究中三个冗余校正的实例。我的结论是，冗余校正代表了统计主义的一种症状，并呼吁在多重检验校正中采用更精细的基于推断的方法。