Often the question arises whether $Y$ can be predicted based on $X$ using a certain model. Especially for highly flexible models such as neural networks one may ask whether a seemingly good prediction is actually better than fitting pure noise or whether it has to be attributed to the flexibility of the model. This paper proposes a rigorous permutation test to assess whether the prediction is better than the prediction of pure noise. The test avoids any sample splitting and is based instead on generating new pairings of $(X_i, Y_j)$. It introduces a new formulation of the null hypothesis and rigorous justification for the test, which distinguishes it from previous literature. The theoretical findings are applied both to simulated data and to sensor data of tennis serves in an experimental context. The simulation study underscores how the available information affects the test. It shows that the less informative the predictors, the lower the probability of rejecting the null hypothesis of fitting pure noise and emphasizes that detecting weaker dependence between variables requires a sufficient sample size.
翻译:常常需要回答这样的问题:是否能够基于X使用特定模型预测Y。尤其是对于神经网络等高度灵活的模型,人们可能会问,看似良好的预测是否实际上比拟合纯噪声更好,还是必须归因于模型的灵活性。本文提出了一种严格的置换检验,以评估预测是否优于纯噪声的预测。该检验避免了任何样本分割,而是基于生成(X_i, Y_j)的新配对。它引入了原假设的新表述和检验的严格论证,这使其区别于以往的文献。理论结果既应用于模拟数据,也应用于实验情境中网球发球的传感器数据。模拟研究强调了可用信息如何影响检验。结果表明,预测变量信息量越少,拒绝拟合纯噪声原假设的概率越低,并强调检测变量之间较弱的依赖性需要足够的样本量。