Tractability from overparametrization: The example of the negative perceptron

In the negative perceptron problem we are given $n$ data points $({\boldsymbol x}_i,y_i)$, where ${\boldsymbol x}_i$ is a $d$-dimensional vector and $y_i\in\{+1,-1\}$ is a binary label. The data are not linearly separable and hence we content ourselves to find a linear classifier with the largest possible \emph{negative} margin. In other words, we want to find a unit norm vector ${\boldsymbol \theta}$ that maximizes $\min_{i\le n}y_i\langle {\boldsymbol \theta},{\boldsymbol x}_i\rangle$. This is a non-convex optimization problem (it is equivalent to finding a maximum norm vector in a polytope), and we study its typical properties under two random models for the data. We consider the proportional asymptotics in which $n,d\to \infty$ with $n/d\to\delta$, and prove upper and lower bounds on the maximum margin $\kappa_{\text{s}}(\delta)$ or -- equivalently -- on its inverse function $\delta_{\text{s}}(\kappa)$. In other words, $\delta_{\text{s}}(\kappa)$ is the overparametrization threshold: for $n/d\le \delta_{\text{s}}(\kappa)-\varepsilon$ a classifier achieving vanishing training error exists with high probability, while for $n/d\ge \delta_{\text{s}}(\kappa)+\varepsilon$ it does not. Our bounds on $\delta_{\text{s}}(\kappa)$ match to the leading order as $\kappa\to -\infty$. We then analyze a linear programming algorithm to find a solution, and characterize the corresponding threshold $\delta_{\text{lin}}(\kappa)$. We observe a gap between the interpolation threshold $\delta_{\text{s}}(\kappa)$ and the linear programming threshold $\delta_{\text{lin}}(\kappa)$, raising the question of the behavior of other algorithms.

翻译：在负感知机问题中，我们给定 $n$ 个数据点 $({\boldsymbol x}_i,y_i)$，其中 ${\boldsymbol x}_i$ 是 $d$ 维向量，$y_i\in\{+1,-1\}$ 是二元标签。由于数据并非线性可分离，我们退而求其次，寻找具有最大可能 \emph{负}间隔的线性分类器。换言之，我们希望找到单位范数向量 ${\boldsymbol \theta}$，使得 $\min_{i\le n}y_i\langle {\boldsymbol \theta},{\boldsymbol x}_i\rangle$ 最大化。这是一个非凸优化问题（等价于在多面体中寻找最大范数向量），我们在两种随机数据模型下研究其典型性质。考虑 $n,d\to \infty$ 且 $n/d\to\delta$ 的比例渐近情形，并证明最大间隔 $\kappa_{\text{s}}(\delta)$ 的上界和下界——等价地，其反函数 $\delta_{\text{s}}(\kappa)$ 的界限。换句话说，$\delta_{\text{s}}(\kappa)$ 是过参数化阈值：当 $n/d\le \delta_{\text{s}}(\kappa)-\varepsilon$ 时，高概率存在一个能达到零训练误差的分类器；而当 $n/d\ge \delta_{\text{s}}(\kappa)+\varepsilon$ 时则不存在。我们对 $\delta_{\text{s}}(\kappa)$ 的界限在 $\kappa\to -\infty$ 时达到主阶一致。随后，我们分析了一种线性规划算法以求解该问题，并刻画了相应的阈值 $\delta_{\text{lin}}(\kappa)$。我们观察到插值阈值 $\delta_{\text{s}}(\kappa)$ 与线性规划阈值 $\delta_{\text{lin}}(\kappa)$ 之间存在差距，这引出了其他算法行为的问题。