Many scientific applications involve testing theories that are only partially specified. This task often amounts to testing the goodness-of-fit of a candidate distribution while allowing for reasonable deviations from it. The tolerant testing framework provides a systematic way of constructing such tests. Rather than testing the simple null hypothesis that data was drawn from a candidate distribution, a tolerant test assesses whether the data is consistent with any distribution that lies within a given neighborhood of the candidate. As this neighborhood grows, the tolerance to misspecification increases, while the power of the test decreases. In this work, we characterize the information-theoretic trade-off between the size of the neighborhood and the power of the test, in several canonical models. On the one hand, we characterize the optimal trade-off for tolerant testing in the Gaussian sequence model, under deviations measured in both smooth and non-smooth norms. On the other hand, we study nonparametric analogues of this problem in smooth regression and density models. Along the way, we establish the sub-optimality of the classical chi-squared statistic for tolerant testing, and study simple alternative hypothesis tests.
翻译:许多科学应用涉及检验仅部分明确的理论。这一任务通常归结为在允许与候选分布存在合理偏差的情况下,检验其拟合优度。容忍检验框架为构建此类检验提供了系统方法。与检验数据是否来自候选分布这一简单零假设不同,容忍检验评估数据是否与候选分布给定邻域内的任何分布相一致。随着该邻域的扩大,对设定错误的容忍度增加,而检验的效力则降低。在本研究中,我们在若干经典模型中刻画了邻域大小与检验效力之间的信息论权衡。一方面,我们刻画了高斯序列模型中容忍检验的最优权衡,其中偏差通过光滑与非光滑范数进行度量。另一方面,我们在光滑回归与密度模型中研究了该问题的非参数类比。在此过程中,我们证明了经典卡方统计量在容忍检验中的次优性,并研究了简单的备择假设检验。