Learning to differentiate model distributions from observed data is a fundamental problem in statistics and machine learning, and high-dimensional data remains a challenging setting for such problems. Metrics that quantify the disparity in probability distributions, such as the Stein discrepancy, play an important role in high-dimensional statistical testing. In this paper, we investigate the role of $L^2$ regularization in training a neural network Stein critic so as to distinguish between data sampled from an unknown probability distribution and a nominal model distribution. Making a connection to the Neural Tangent Kernel (NTK) theory, we develop a novel staging procedure for the weight of regularization over training time, which leverages the advantages of highly-regularized training at early times. Theoretically, we prove the approximation of the training dynamic by the kernel optimization, namely the ``lazy training'', when the $L^2$ regularization weight is large, and training on $n$ samples converge at a rate of ${O}(n^{-1/2})$ up to a log factor. The result guarantees learning the optimal critic assuming sufficient alignment with the leading eigen-modes of the zero-time NTK. The benefit of the staged $L^2$ regularization is demonstrated on simulated high dimensional data and an application to evaluating generative models of image data.
翻译:区分模型分布与观测数据是统计学和机器学习中的基本问题,而高维数据仍为此类问题带来挑战。量化概率分布差异的度量(如Stein散度)在高维统计检验中发挥着重要作用。本文研究了$L^2$正则化在训练神经网络Stein判别器以区分未知概率分布采样数据与名义模型分布中的作用。通过连接神经正切核理论,我们提出了一种新颖的正则化权重分阶段调整策略,该策略在训练早期充分利用强正则化的优势。理论上,我们证明了当$L^2$正则化权重较大时,训练动态可近似为核优化(即“惰性训练”),且在$n$个样本上的训练收敛速率达到${O}(n^{-1/2})$(忽略对数因子)。该结果保证了在充分对齐零点神经正切核主导本征模的条件下可学习到最优判别器。通过模拟高维数据以及图像数据生成模型评估的应用,验证了分阶段$L^2$正则化的有效性。