We obtain the minimax rate for a mean location model with a bounded star-shaped set $K \subseteq \mathbb{R}^n$ constraint on the mean, in an adversarially corrupted data setting with Gaussian noise. We assume an unknown fraction $\epsilon<1/2-\kappa$ for some fixed $\kappa\in(0,1/2]$ of the $N$ observations are arbitrarily corrupted. We obtain a minimax risk up to proportionality constants under the squared $\ell_2$ loss of $\max(\eta^{*2},\sigma^2\epsilon^2)\wedge d^2$ with \begin{align*} \eta^* = \sup \bigg\{\eta : \frac{N\eta^2}{\sigma^2} \leq \log M^{\operatorname{loc}}(\eta,c)\bigg\}, \end{align*} where $\log M^{\operatorname{loc}}(\eta,c)$ denotes the local entropy of the set $K$, $d$ is the diameter of $K$, $\sigma^2$ is the variance, and $c$ is some sufficiently large absolute constant. A variant of our algorithm achieves the same rate for settings with known or symmetric sub-Gaussian noise, with a smaller breakdown point, still of constant order. We further study the case of unknown sub-Gaussian noise and show that the rate is slightly slower: $\max(\eta^{*2},\sigma^2\epsilon^2\log(1/\epsilon))\wedge d^2$. We generalize our results to the case when $K$ is star-shaped but unbounded.
翻译:我们在高斯噪声的对抗性污染数据设置中,针对均值受限于有界星形集$K \subseteq \mathbb{R}^n$的均值位置模型,获得了极小极大速率。假设$N$个观测值中有一个未知比例$\epsilon<1/2-\kappa$(其中$\kappa\in(0,1/2]$为固定常数)被任意污染。在平方$\ell_2$损失下,我们得到了与$\max(\eta^{*2},\sigma^2\epsilon^2)\wedge d^2$成比例的极小极大风险,其中\begin{align*} \eta^* = \sup \bigg\{\eta : \frac{N\eta^2}{\sigma^2} \leq \log M^{\operatorname{loc}}(\eta,c)\bigg\}, \end{align*}这里$\log M^{\operatorname{loc}}(\eta,c)$表示集合$K$的局部熵,$d$是$K$的直径,$\sigma^2$是方差,$c$是某个足够大的绝对常数。我们算法的一个变体在已知或对称的次高斯噪声设置中实现了相同的速率,但具有更小的崩溃点(仍为常数阶)。我们进一步研究了未知次高斯噪声的情况,并证明其速率稍慢:$\max(\eta^{*2},\sigma^2\epsilon^2\log(1/\epsilon))\wedge d^2$。我们将结果推广到$K$为星形但无界的情形。