The widespread use of black box prediction methods has sparked an increasing interest in algorithm/model-agnostic approaches for quantifying goodness-of-fit, with direct ties to specification testing, model selection and variable importance assessment. A commonly used framework involves defining a predictiveness criterion, applying a cross-fitting procedure to estimate the predictiveness, and utilizing the difference in estimated predictiveness between two models as the test statistic. However, even after standardization, the test statistic typically fails to converge to a non-degenerate distribution under the null hypothesis of equal goodness, leading to what is known as the degeneracy issue. To addresses this degeneracy issue, we present a simple yet effective device, Zipper. It draws inspiration from the strategy of additional splitting of testing data, but encourages an overlap between two testing data splits in predictiveness evaluation. Zipper binds together the two overlapping splits using a slider parameter that controls the proportion of overlap. Our proposed test statistic follows an asymptotically normal distribution under the null hypothesis for any fixed slider value, guaranteeing valid size control while enhancing power by effective data reuse. Finite-sample experiments demonstrate that our procedure, with a simple choice of the slider, works well across a wide range of settings.
翻译:黑箱预测方法的广泛使用激发了人们对量化拟合优度的算法/模型无关方法的日益关注,这类方法与模型设定检验、模型选择及变量重要性评估密切相关。常用的框架包括:定义可预测性准则,应用交叉拟合程序估计可预测性,并利用两个模型估计可预测性之差作为检验统计量。然而,即使在标准化后,该检验统计量在拟合优度相等的原假设下通常无法收敛到非退化分布,这被称为退化问题。为解决这一退化问题,我们提出一种简单而有效的装置——Zipper。该装置受测试数据额外拆分策略启发,但在可预测性评估中鼓励两个测试数据拆分之间存在重叠。Zipper通过控制重叠比例的滑块参数将两个重叠拆分绑定在一起。在任意固定滑块值下,我们提出的检验统计量在原假设下渐进服从正态分布,既保证了有效的检验尺度控制,又通过数据重用提升了检验功效。有限样本实验表明,通过简单选择滑块参数,我们的方法在广泛场景中均表现良好。