Many conventional learning algorithms rely on loss functions other than the natural 0-1 loss for computational efficiency and theoretical tractability. Among them are approaches based on absolute loss (L1 regression) and square loss (L2 regression). The first is proved to be an \textit{agnostic} PAC learner for various important concept classes such as \textit{juntas}, and \textit{half-spaces}. On the other hand, the second is preferable because of its computational efficiency, which is linear in the sample size. However, PAC learnability is still unknown as guarantees have been proved only under distributional restrictions. The question of whether L2 regression is an agnostic PAC learner for 0-1 loss has been open since 1993 and yet has to be answered. This paper resolves this problem for the junta class on the Boolean cube -- proving agnostic PAC learning of k-juntas using L2 polynomial regression. Moreover, we present a new PAC learning algorithm based on the Boolean Fourier expansion with lower computational complexity. Fourier-based algorithms, such as Linial et al. (1993), have been used under distributional restrictions, such as uniform distribution. We show that with an appropriate change, one can apply those algorithms in agnostic settings without any distributional assumption. We prove our results by connecting the PAC learning with 0-1 loss to the minimum mean square estimation (MMSE) problem. We derive an elegant upper bound on the 0-1 loss in terms of the MMSE error and show that the sign of the MMSE is a PAC learner for any concept class containing it.
翻译:许多传统学习算法依赖于自然0-1损失以外的损失函数,以实现计算效率和理论可处理性。其中包括基于绝对损失(L1回归)和平方损失(L2回归)的方法。前者已被证明是多种重要概念类(如连接函数和半空间)的\textit{不可知}PAC学习器。而后者因其计算效率(与样本量呈线性关系)而更具优势。然而,由于仅在分布限制下获得了保证,L2回归的PAC可学习性仍属未知。L2回归是否是0-1损失的不可知PAC学习器这一问题自1993年以来一直悬而未决。本文针对布尔立方体上的连接函数类解决了该问题——证明使用L2多项式回归可实现k-连接函数的不可知PAC学习。此外,我们提出了一种基于布尔傅里叶展开的新型PAC学习算法,具有更低的计算复杂度。傅里叶类算法(如Linial等,1993)此前仅在均匀分布等分布限制下使用。我们证明通过适当修改,这些算法可在无任何分布假设的不可知场景下应用。我们通过将0-1损失的PAC学习与最小均方估计问题相关联来证明结果。我们推导出以MMSE误差表示的0-1损失优雅上界,并表明MMSE的符号函数是任何包含它的概念类的PAC学习器。