We show hardness of improperly learning halfspaces in the agnostic model, both in the distribution-independent as well as the distribution-specific setting, based on the assumption that worst-case lattice problems, such as GapSVP or SIVP, are hard. In particular, we show that under this assumption there is no efficient algorithm that outputs any binary hypothesis, not necessarily a halfspace, achieving misclassfication error better than $\frac 1 2 - \gamma$ even if the optimal misclassification error is as small is as small as $\delta$. Here, $\gamma$ can be smaller than the inverse of any polynomial in the dimension and $\delta$ as small as $exp(-\Omega(\log^{1-c}(d)))$, where $0 < c < 1$ is an arbitrary constant and $d$ is the dimension. For the distribution-specific setting, we show that if the marginal distribution is standard Gaussian, for any $\beta > 0$ learning halfspaces up to error $OPT_{LTF} + \epsilon$ takes time at least $d^{\tilde{\Omega}(1/\epsilon^{2-\beta})}$ under the same hardness assumptions. Similarly, we show that learning degree-$\ell$ polynomial threshold functions up to error $OPT_{{PTF}_\ell} + \epsilon$ takes time at least $d^{\tilde{\Omega}(\ell^{2-\beta}/\epsilon^{2-\beta})}$. $OPT_{LTF}$ and $OPT_{{PTF}_\ell}$ denote the best error achievable by any halfspace or polynomial threshold function, respectively. Our lower bounds qualitively match algorithmic guarantees and (nearly) recover known lower bounds based on non-worst-case assumptions. Previously, such hardness results [Daniely16, DKPZ21] were based on average-case complexity assumptions or restricted to the statistical query model. Our work gives the first hardness results basing these fundamental learning problems on worst-case complexity assumptions. It is inspired by a sequence of recent works showing hardness of learning well-separated Gaussian mixtures based on worst-case lattice problems.
翻译:我们展示了在不可知模型下,无论是在分布无关还是分布特定设置中,非正确学习半空间的困难性,其假设基于最坏情况格问题(如GapSVP或SIVP)是难解的。特别地,我们证明在此假设下,不存在任何高效算法能够输出任意二值假设(不一定是半空间),使得即使最优误分类误差小至$\delta$,其误分类误差也能优于$\frac{1}{2}-\gamma$。此处$\gamma$可以小于维度中任意多项式的倒数,而$\delta$可小至$\exp(-\Omega(\log^{1-c}(d)))$,其中$0<c<1$为任意常数,$d$为维度。对于分布特定设置,我们证明若边际分布为标准高斯分布,则对于任意$\beta>0$,在相同困难性假设下,学习半空间至误差$OPT_{LTF}+\epsilon$至少需要$d^{\tilde{\Omega}(1/\epsilon^{2-\beta})}$时间。类似地,学习$\ell$次多项式阈值函数至误差$OPT_{{PTF}_\ell}+\epsilon$至少需要$d^{\tilde{\Omega}(\ell^{2-\beta}/\epsilon^{2-\beta})}$时间。其中$OPT_{LTF}$和$OPT_{{PTF}_\ell}$分别表示任何半空间或多项式阈值函数所能达到的最佳误差。我们的下界在性质上与算法保证相匹配,并(近乎)恢复了基于非最坏情况假设的已知下界。此前,此类困难性结果[Daniely16, DKPZ21]基于平均情况复杂性假设或局限于统计查询模型。我们的工作首次将这些基础学习问题的困难性建立在最坏情况复杂性假设之上。这一结果受近期一系列基于最坏情况格问题证明学习良分离高斯混合模型困难性的研究启发。