This paper addresses the statistical problem of estimating the infinite-norm deviation from the empirical mean to the distribution mean for high-dimensional distributions on $\{0,1\}^d$, potentially with $d=\infty$. Unlike traditional bounds as in the classical Glivenko-Cantelli theorem, we explore the instance-dependent convergence behavior. For product distributions, we provide the exact non-asymptotic behavior of the expected maximum deviation, revealing various regimes of decay. In particular, these tight bounds demonstrate the necessity of a previously proposed factor for an upper bound, answering a corresponding COLT 2023 open problem. We also consider general distributions on $\{0,1\}^d$ and provide the tightest possible bounds for the maximum deviation of the empirical mean given only the mean statistic. Along the way, we prove a localized version of the Dvoretzky-Kiefer-Wolfowitz inequality. Additionally, we present some results for two other cases, one where the deviation is measured in some $q$-norm, and the other where the distribution is supported on a continuous domain $[0,1]^d$, and also provide some high-probability bounds for the maximum deviation in the independent Bernoulli case.
翻译:本文研究高维分布(定义在$\{0,1\}^d$上,可能$d=\infty$)中,经验均值与分布均值之间无穷范数偏差的统计估计问题。与传统Glivenko-Cantelli定理中的经典界不同,我们探讨了依赖于具体实例的收敛行为。对于乘积分布,我们给出了期望最大偏差的精确非渐近行为,揭示了多种衰减机制。特别地,这些紧界证明了先前提出的上界因子具有必要性,从而解决了COLT 2023的一个公开问题。我们还考虑了$\{0,1\}^d$上的一般分布,并仅基于均值统计量给出了经验均值最大偏差的最紧可能界。在此过程中,我们证明了Dvoretzky-Kiefer-Wolfowitz不等式的局部化版本。此外,我们给出了另外两种情形的部分结果:一是偏差以$q$-范数度量的情况,二是分布支撑在连续域$[0,1]^d$上的情况,同时提供了独立伯努利情形下最大偏差的高概率界。