两样本检验速率在有限样本位置估计中的可达性 (Attainability of Two-Point Testing Rates for Finite-Sample Location Estimation)

Le Cam's two-point testing method yields perhaps the simplest lower bound for estimating the mean of a distribution: roughly, if it is impossible to well-distinguish a distribution centered at $μ$ from the same distribution centered at $μ+Δ$, then it is impossible to estimate the mean by better than $Δ/2$. It is setting-dependent whether or not a nearly matching upper bound is attainable. We study the conditions under which the two-point testing lower bound can be attained for univariate mean estimation; both in the setting of location estimation (where the distribution is known up to translation) and adaptive location estimation (unknown distribution). Roughly, we will say an estimate nearly attains the two-point testing lower bound if it incurs error that is at most polylogarithmically larger than the Hellinger modulus of continuity for $\tildeΩ(n)$ samples. Adaptive location estimation is particularly interesting as some distributions admit much better guarantees than sub-Gaussian rates (e.g. $\operatorname{Unif}(μ-1,μ+1)$ permits error $Θ(\frac{1}{n})$, while the sub-Gaussian rate is $Θ(\frac{1}{\sqrt{n}})$), yet it is not obvious whether these rates may be adaptively attained by one unified approach. Our main result designs an algorithm that nearly attains the two-point testing rate for mixtures of symmetric, log-concave distributions with a common mean. Moreover, this algorithm runs in near-linear time and is parameter-free. In contrast, we show the two-point testing rate is not nearly attainable even for symmetric, unimodal distributions. We complement this with results for location estimation, showing the two-point testing rate is nearly attainable for unimodal distributions, but unattainable for symmetric distributions.

翻译：Le Cam的两样本检验方法给出了估计分布均值可能最简单的一个下界：粗略地说，如果无法很好地区分中心在$μ$的分布与同一分布中心在$μ+Δ$的情况，那么均值的估计误差就不可能优于$Δ/2$。是否存在几乎匹配的上界可达则取决于具体设定。我们研究了单变量均值估计中两样本检验下界可达的条件；既包括位置估计（分布除平移外已知）的情形，也包括自适应位置估计（分布未知）的情形。粗略而言，若某个估计量在$\tildeΩ(n)$个样本下产生的误差至多是$\tildeΩ(n)$个样本下Hellinger连续模的多对数倍，则称该估计量几乎达到了两样本检验下界。自适应位置估计尤为有趣，因为某些分布允许比次高斯速率更好的保证（例如$\operatorname{Unif}(μ-1,μ+1)$允许$Θ(\frac{1}{n})$的误差，而次高斯速率为$Θ(\frac{1}{\sqrt{n}})$），但这些速率是否可以通过一种统一方法自适应地达到并不显然。我们的主要结果设计了一种算法，对于具有共同均值的对称对数凹分布的混合，该算法几乎达到了两样本检验速率。此外，该算法运行于近线性时间且无需参数调优。与之相对，我们证明即使对于对称单峰分布，两样本检验速率也几乎不可达。我们通过位置估计的结果对此进行了补充，表明两样本检验速率对于单峰分布几乎可达，但对于对称分布则不可达。