We study the problem of estimating the mean of an identity covariance Gaussian in the truncated setting, in the regime when the truncation set comes from a low-complexity family $\mathcal{C}$ of sets. Specifically, for a fixed but unknown truncation set $S \subseteq \mathbb{R}^d$, we are given access to samples from the distribution $\mathcal{N}(\boldsymbol{ \mu}, \mathbf{ I})$ truncated to the set $S$. The goal is to estimate $\boldsymbol\mu$ within accuracy $\epsilon>0$ in $\ell_2$-norm. Our main result is a Statistical Query (SQ) lower bound suggesting a super-polynomial information-computation gap for this task. In more detail, we show that the complexity of any SQ algorithm for this problem is $d^{\mathrm{poly}(1/\epsilon)}$, even when the class $\mathcal{C}$ is simple so that $\mathrm{poly}(d/\epsilon)$ samples information-theoretically suffice. Concretely, our SQ lower bound applies when $\mathcal{C}$ is a union of a bounded number of rectangles whose VC dimension and Gaussian surface are small. As a corollary of our construction, it also follows that the complexity of the previously known algorithm for this task is qualitatively best possible.
翻译:我们研究了在截断设置下估计单位协方差高斯分布均值的问题,其中截断集来自低复杂度集族 $\mathcal{C}$。具体而言,对于固定但未知的截断集 $S \subseteq \mathbb{R}^d$,我们能够获得来自分布 $\mathcal{N}(\boldsymbol{\mu}, \mathbf{I})$ 在集合 $S$ 上截断后的样本。目标是在 $\ell_2$-范数下以精度 $\epsilon>0$ 估计 $\boldsymbol{\mu}$。我们的主要结果是统计查询(SQ)下界,表明该任务存在超多项式信息-计算差距。更详细地说,我们证明任何SQ算法解决此问题的复杂度为 $d^{\mathrm{poly}(1/\epsilon)}$,即使当类 $\mathcal{C}$ 简单到在信息论上 $\mathrm{poly}(d/\epsilon)$ 个样本就足够时也是如此。具体而言,当 $\mathcal{C}$ 是由有限个矩形组成的并集且其VC维数和高斯表面积较小时,我们的SQ下界成立。作为我们构造的推论,还表明先前已知的解决此问题的算法在性质上是最优的。