Goodness-of-fit tests are crucial tools for assessing the validity of statistical models. In this paper, we introduce a novel approach, the Spectral Smooth Test (SST), that generalizes Neyman's smooth test to high-dimensional data settings. While conventional goodness-of-fit tests for univariate data are well-established, extending them to high dimensions, such as images, trajectories, and SNPs, poses significant challenges. Our proposed SST leverages spectral bases, which adapt naturally to the geometry of feature spaces, to model multivariate distributions. Unlike traditional orthogonal bases, these spectral bases are tailored to the data distribution, enabling more effective function modeling. The SST framework offers a principled way to estimate the underlying model, thereby providing actionable insights even when the null hypothesis is rejected. We present experimental results demonstrating the robustness of SST across various tuning parameter choices and compare its performance against other goodness-of-fit tests. Furthermore, we apply SST to the MNIST dataset as a real-world example, showcasing its effectiveness in high-dimensional scenarios.
翻译:拟合优度检验是评估统计模型有效性的关键工具。本文提出了一种新方法——谱光滑检验(Spectral Smooth Test, SST),将奈曼光滑检验推广至高维数据场景。尽管单变量数据的传统拟合优度检验已相当成熟,但将其扩展至高维数据(如图像、轨迹和单核苷酸多态性)仍面临显著挑战。本文提出的SST方法利用能自然适应特征空间几何结构的谱基来建模多元分布。与传统正交基不同,这些谱基根据数据分布进行定制,从而能够实现更有效的函数建模。SST框架提供了一种估计基础模型的原则性方法,即使在原假设被拒绝的情况下也能提供可操作的见解。我们通过实验展示了SST在不同调优参数选择下的稳健性,并将其与其他拟合优度检验的性能进行了比较。此外,我们还将SST应用于MNIST数据集这一真实案例,验证了其在高维场景中的有效性。