In traditional models of supervised learning, the goal of a learner -- given examples from an arbitrary joint distribution on $\mathbb{R}^d \times \{\pm 1\}$ -- is to output a hypothesis that is competitive (to within $\epsilon$) of the best fitting concept from some class. In order to escape strong hardness results for learning even simple concept classes, we introduce a smoothed-analysis framework that requires a learner to compete only with the best classifier that is robust to small random Gaussian perturbation. This subtle change allows us to give a wide array of learning results for any concept that (1) depends on a low-dimensional subspace (aka multi-index model) and (2) has a bounded Gaussian surface area. This class includes functions of halfspaces and (low-dimensional) convex sets, cases that are only known to be learnable in non-smoothed settings with respect to highly structured distributions such as Gaussians. Surprisingly, our analysis also yields new results for traditional non-smoothed frameworks such as learning with margin. In particular, we obtain the first algorithm for agnostically learning intersections of $k$-halfspaces in time $k^{poly(\frac{\log k}{\epsilon \gamma}) }$ where $\gamma$ is the margin parameter. Before our work, the best-known runtime was exponential in $k$ (Arriaga and Vempala, 1999).
翻译:在传统的监督学习模型中,学习者的目标——给定来自 $\mathbb{R}^d \times \{\pm 1\}$ 上任意联合分布的样本——是输出一个假设,该假设与来自某个概念类的最佳拟合概念相比具有竞争力(在 $\epsilon$ 范围内)。为了规避学习即使是简单概念类也会遇到的强硬度结果,我们引入了一个平滑分析框架,该框架仅要求学习者与对小的随机高斯扰动具有鲁棒性的最佳分类器竞争。这一细微的改变使我们能够为任何(1)依赖于低维子空间(即多索引模型)且(2)具有有界高斯表面积的概念提供广泛的学习结果。此类包括半空间函数和(低维)凸集函数,这些情况仅在非平滑设置中针对高度结构化的分布(如高斯分布)已知是可学习的。令人惊讶的是,我们的分析还为传统的非平滑框架(例如带间隔的学习)产生了新的结果。具体而言,我们获得了第一个在时间 $k^{poly(\frac{\log k}{\epsilon \gamma}) }$ 内不可知地学习 $k$ 个半空间交集的算法,其中 $\gamma$ 是间隔参数。在我们的工作之前,已知的最佳运行时间在 $k$ 上是指数级的(Arriaga 和 Vempala,1999)。