Concentration inequalities for the sample mean, like those due to Bernstein and Hoeffding, are valid for any sample size but overly conservative, yielding confidence intervals that are unnecessarily wide. The central limit theorem (CLT) provides asymptotic confidence intervals with optimal width, but these are invalid for all sample sizes. To resolve this tension, we develop new computable concentration inequalities with asymptotically optimal size, finite-sample validity, and sub-Gaussian decay. These bounds enable the construction of efficient confidence intervals with correct coverage for any sample size and efficient empirical Berry-Esseen bounds that require no prior knowledge of the population variance. We derive our inequalities by tightly bounding non-uniform Kolmogorov and Wasserstein distances to a Gaussian using zero-bias couplings and Stein's method of exchangeable pairs.
翻译:对于样本均值的集中不等式,例如Bernstein和Hoeffding提出的不等式,虽然对任意样本量均成立,但往往过于保守,导致构建的置信区间不必要地宽泛。中心极限定理(CLT)提供了具有最优宽度的渐近置信区间,但这些区间对所有有限样本量均不成立。为解决这一矛盾,我们提出了新的可计算集中不等式,其具有渐近最优宽度、有限样本有效性以及次高斯衰减特性。这些界使得我们能够构建对任意样本量均具有正确覆盖率的有效置信区间,以及无需总体方差先验知识的有效经验Berry-Esseen界。我们通过使用零偏耦合与Stein可交换对方法,对柯尔莫哥洛夫距离与瓦瑟斯坦距离到高斯分布的非均匀距离进行紧致界推导,从而得到了这些不等式。