Randomization tests rely on simple data transformations and possess an appealing robustness property. In addition to being finite-sample valid if the data distribution is invariant under the transformation, these tests can be asymptotically valid under a suitable studentization of the test statistic, even if the invariance does not hold. However, practical implementation often encounters noisy data, resulting in approximate randomization tests that may not be as robust. In this paper, our key theoretical contribution is a non-asymptotic bound on the discrepancy between the size of an approximate randomization test and the size of the original randomization test using noiseless data. This allows us to derive novel conditions for the validity of approximate randomization tests under data invariances, while being able to leverage existing results based on studentization if the invariance does not hold. We illustrate our theory through several examples, including tests of significance in linear regression. Our theory can explain certain aspects of how randomization tests perform in small samples, addressing limitations of prior theoretical results.
翻译:随机化检验依赖于简单的数据变换并具有吸引人的稳健性特性。除了在数据分布满足变换不变性时具有有限样本有效性之外,即使不变性条件不成立,通过对检验统计量进行适当的标准化处理,这些检验仍能保持渐近有效性。然而,实际应用中常遇到含噪声数据,这会导致近似随机化检验可能丧失稳健性。本文的核心理论贡献在于推导出近似随机化检验与原始无噪声数据随机化检验之间显著性水平偏差的非渐近界。这一成果使我们能够在数据不变性条件下推导出近似随机化检验有效性的新条件,同时在不变性条件不成立时,仍可借助基于标准化的现有研究成果。我们通过多个实例(包括线性回归中的显著性检验)阐释了该理论。本理论能解释随机化检验在小样本中的某些表现特征,弥补了先前理论结果的局限性。