Often the question arises whether $Y$ can be predicted based on $X$ using a certain model. Especially for highly flexible models such as neural networks one may ask whether a seemingly good prediction is actually better than fitting pure noise or whether it has to be attributed to the flexibility of the model. This paper proposes a rigorous permutation test to assess whether the prediction is better than the prediction of pure noise. The test avoids any sample splitting and is based instead on generating new pairings of $(X_i,Y_j)$. It introduces a new formulation of the null hypothesis and rigorous justification for the test, which distinguishes it from previous literature. The theoretical findings are applied both to simulated data and to sensor data of tennis serves in an experimental context. The simulation study underscores how the available information affects the test. It shows that the less informative the predictors, the lower the probability of rejecting the null hypothesis of fitting pure noise and emphasizes that detecting weaker dependence between variables requires a sufficient sample size.
翻译:在回归问题中,常需判断是否可通过特定模型基于X预测Y。尤其对于神经网络等高度灵活模型,人们可能质疑看似良好的预测结果是否真正优于对纯噪声的拟合,抑或应归因于模型的灵活性。本文提出一种严格的置换检验方法,用于评估预测结果是否优于对纯噪声的预测。该检验避免任何样本分割,而是基于生成新的(X_i,Y_j)配对。本文引入零假设的新表述并为检验提供严格的理论依据,从而区别于以往文献。理论发现应用于模拟数据及实验场景中网球发球传感器数据。模拟研究揭示了可用信息对检验效果的影响:预测变量信息量越少,拒绝"拟合纯噪声"这一零假设的概率越低,并强调检测变量间较弱依赖性需要足够的样本量。