Given univariate random variables $Y_1, \ldots, Y_n$ with the $\text{Uniform}(\theta_0 - 1, \theta_0 + 1)$ distribution, the sample midrange $\frac{Y_{(n)}+Y_{(1)}}{2}$ is the MLE for $\theta_0$ and estimates $\theta_0$ with error of order $1/n$, which is much smaller compared with the $1/\sqrt{n}$ error rate of the usual sample mean estimator. However, the sample midrange performs poorly when the data has say the Gaussian $N(\theta_0, 1)$ distribution, with an error rate of $1/\sqrt{\log n}$. In this paper, we propose an estimator of the location $\theta_0$ with a rate of convergence that can, in many settings, adapt to the underlying distribution which we assume to be symmetric around $\theta_0$ but is otherwise unknown. When the underlying distribution is compactly supported, we show that our estimator attains a rate of convergence of $n^{-\frac{1}{\alpha}}$ up to polylog factors, where the rate parameter $\alpha$ can take on any value in $(0, 2]$ and depends on the moments of the underlying distribution. Our estimator is formed by the $\ell^\gamma$-center of the data, for a $\gamma\geq2$ chosen in a data-driven way -- by minimizing a criterion motivated by the asymptotic variance. Our approach can be directly applied to the regression setting where $\theta_0$ is a function of observed features and motivates the use of $\ell^\gamma$ loss function for $\gamma > 2$ in certain settings.
翻译:给定一元随机变量 $Y_1, \ldots, Y_n$ 服从 $\text{均匀分布}(\theta_0 - 1, \theta_0 + 1)$,样本中程数 $\frac{Y_{(n)}+Y_{(1)}}{2}$ 是 $\theta_0$ 的极大似然估计,其估计误差阶为 $1/n$,远小于通常样本均值估计量的 $1/\sqrt{n}$ 误差率。然而,当数据服从高斯分布 $N(\theta_0, 1)$ 时,样本中程数表现较差,误差率仅为 $1/\sqrt{\log n}$。本文提出一种位置参数 $\theta_0$ 的估计方法,其收敛速度可在多种设定下自适应地适应潜在分布——我们假设该分布关于 $\theta_0$ 对称但具体形式未知。当潜在分布具有紧支撑时,我们证明所提估计量在 polylog 因子意义下达到 $n^{-\frac{1}{\alpha}}$ 的收敛速度,其中速率参数 $\alpha$ 可取 $(0, 2]$ 中任意值,且依赖于潜在分布的矩。该估计量通过数据的 $\ell^\gamma$ 中心构造,其中 $\gamma\geq2$ 通过数据驱动方式选择——通过最小化由渐近方差导出的准则。本文方法可直接应用于 $\theta_0$ 为观测特征函数的回归设定,并在特定场景下为采用 $\gamma > 2$ 的 $\ell^\gamma$ 损失函数提供了理论依据。