The two-sample homogeneity testing problem is fundamental in statistics and becomes particularly challenging in high dimensions, where classical tests can suffer substantial power loss. We develop a learning-assisted procedure based on the projection 1-Wasserstein distance, which we call the neural Wasserstein test. The method is motivated by the observation that there often exists a low-dimensional projection under which the two high-dimensional distributions differ. In practice, we learn the projection directions via manifold optimization and a witness function using deep neural networks. To adapt to unknown projection dimensions and sparsity levels, we aggregate a collection of candidate statistics through a max-type construction, avoiding explicit tuning while potentially improving power. We establish the validity and consistency of the proposed test and prove a Berry--Esseen type bound for the Gaussian approximation. In particular, under the null hypothesis, the aggregated statistic converges to the absolute maximum of a standard Gaussian vector, yielding an asymptotically pivotal (distribution-free) calibration that bypasses resampling. Simulation studies and a real-data example demonstrate the strong finite-sample performance of the proposed method.
翻译:双样本同质性检验问题是统计学中的基础问题,在高维情形下变得尤为困难,此时经典检验方法可能遭受显著的功效损失。我们基于投影1-Wasserstein距离提出了一种学习辅助的检验方法,称之为神经Wasserstein检验。该方法的动机源于以下观察:通常存在一个低维投影,使得两个高维分布在该投影下呈现差异。在实际操作中,我们通过流形优化学习投影方向,并利用深度神经网络学习见证函数。为适应未知的投影维度和稀疏性水平,我们通过极大值型构造聚合一组候选统计量,从而避免显式调参并可能提升检验功效。我们建立了所提检验的有效性与一致性,并证明了其高斯近似的Berry-Esseen型误差界。特别地,在原假设下,聚合统计量收敛于标准高斯向量的绝对最大值,从而产生渐近枢轴(分布自由)的校准方法,无需重采样。模拟研究和实际数据案例均表明所提方法具有优异的有限样本性能。