U-statistics are a fundamental class of estimators that generalize the sample mean and underpin much of nonparametric statistics. Although extensively studied in both statistics and probability, key challenges remain: their high computational cost - addressed partly through incomplete U-statistics - and their non-standard asymptotic behavior in the degenerate case, which typically requires resampling methods for hypothesis testing. This paper presents a novel perspective on U-statistics, grounded in hypergraph theory and combinatorial designs. Our approach bypasses the traditional Hoeffding decomposition, the main analytical tool in this literature but one that is highly sensitive to degeneracy. By characterizing the dependence structure of a U-statistic, we derive a Berry-Esseen bound valid for incomplete U-statistics of deterministic designs, yielding conditions under which Gaussian limiting distributions can be established even in degenerate cases and when the order diverges. We also introduce efficient algorithms to construct incomplete U-statistics based on equireplicate designs, a subclass of deterministic designs that, in certain cases, achieve minimum variance. Beyond its theoretical contributions, our framework provides a systematic way to construct permutation-free counterparts to tests based on degenerate U-statistics, as demonstrated in experiments with kernel-based tests using the Maximum Mean Discrepancy and the Hilbert-Schmidt Independence Criterion.
翻译:U-统计量是一类基础估计量,它推广了样本均值并构成了非参数统计的重要基础。尽管在统计学和概率论领域已得到广泛研究,仍存在关键挑战:其高昂的计算成本——部分通过不完全U-统计量得以缓解——以及在退化情形下的非标准渐近行为,这通常需要借助重抽样方法进行假设检验。本文基于超图理论与组合设计,提出了研究U-统计量的新视角。我们的方法绕过了传统的Hoeffding分解(该领域的主要分析工具,但对退化性高度敏感)。通过刻画U-统计量的依赖结构,我们推导出适用于确定性设计不完全U-统计量的Berry-Esseen界,给出了即使在退化情形及阶数发散时仍能建立高斯极限分布的条件。我们还提出了基于等重复设计(确定性设计的一个子类,在某些情况下能达到最小方差)构造不完全U-统计量的高效算法。除了理论贡献外,我们的框架为基于退化U-统计量的检验提供了系统化的免置换替代方案,这在基于最大均值差异与希尔伯特-施密特独立性准则的核检验实验中得到验证。