Outlier Robust Multivariate Polynomial Regression

We study the problem of robust multivariate polynomial regression: let $p\colon\mathbb{R}^n\to\mathbb{R}$ be an unknown $n$-variate polynomial of degree at most $d$ in each variable. We are given as input a set of random samples $(\mathbf{x}_i,y_i) \in [-1,1]^n \times \mathbb{R}$ that are noisy versions of $(\mathbf{x}_i,p(\mathbf{x}_i))$. More precisely, each $\mathbf{x}_i$ is sampled independently from some distribution $\chi$ on $[-1,1]^n$, and for each $i$ independently, $y_i$ is arbitrary (i.e., an outlier) with probability at most $\rho < 1/2$, and otherwise satisfies $|y_i-p(\mathbf{x}_i)|\leq\sigma$. The goal is to output a polynomial $\hat{p}$, of degree at most $d$ in each variable, within an $\ell_\infty$-distance of at most $O(\sigma)$ from $p$. Kane, Karmalkar, and Price [FOCS'17] solved this problem for $n=1$. We generalize their results to the $n$-variate setting, showing an algorithm that achieves a sample complexity of $O_n(d^n\log d)$, where the hidden constant depends on $n$, if $\chi$ is the $n$-dimensional Chebyshev distribution. The sample complexity is $O_n(d^{2n}\log d)$, if the samples are drawn from the uniform distribution instead. The approximation error is guaranteed to be at most $O(\sigma)$, and the run-time depends on $\log(1/\sigma)$. In the setting where each $\mathbf{x}_i$ and $y_i$ are known up to $N$ bits of precision, the run-time's dependence on $N$ is linear. We also show that our sample complexities are optimal in terms of $d^n$. Furthermore, we show that it is possible to have the run-time be independent of $1/\sigma$, at the cost of a higher sample complexity.

翻译：我们研究鲁棒多变量多项式回归问题：设 $p\colon\mathbb{R}^n\to\mathbb{R}$ 为一个未知的 $n$ 变量多项式，每个变量的次数至多为 $d$。输入为随机样本集合 $(\mathbf{x}_i,y_i) \in [-1,1]^n \times \mathbb{R}$，它们是与 $(\mathbf{x}_i,p(\mathbf{x}_i))$ 含噪的版本。更精确地，每个 $\mathbf{x}_i$ 独立地从 $[-1,1]^n$ 上的某个分布 $\chi$ 中采样，对于每个 $i$ 独立地，$y_i$ 以概率至多 $\rho < 1/2$ 为任意值（即离群点），否则满足 $|y_i-p(\mathbf{x}_i)|\leq\sigma$。目标是输出一个每个变量次数至多为 $d$ 的多项式 $\hat{p}$，使其与 $p$ 的 $\ell_\infty$ 距离至多为 $O(\sigma)$。Kane、Karmalkar 和 Price [FOCS'17] 解决了 $n=1$ 的情况。我们将他们的结果推广到 $n$ 变量情形，并展示了一种算法：当 $\chi$ 为 $n$ 维切比雪夫分布时，样本复杂度为 $O_n(d^n\log d)$，其中隐藏常数依赖于 $n$；若样本来自均匀分布，则样本复杂度为 $O_n(d^{2n}\log d)$。近似误差保证不超过 $O(\sigma)$，运行时间依赖于 $\log(1/\sigma)$。在 $\mathbf{x}_i$ 和 $y_i$ 已知精度至多 $N$ 比特的设置中，运行时间对 $N$ 的依赖为线性。我们还证明了我们的样本复杂度在 $d^n$ 意义下是最优的。此外，我们表明可以以更高的样本复杂度为代价，实现运行时间与 $1/\sigma$ 无关。