The Sauer-Shelah-Perles Lemma is a cornerstone of combinatorics and learning theory, bounding the size of a binary hypothesis class in terms of its Vapnik-Chervonenkis (VC) dimension. For classes of functions over a $k$-ary alphabet, namely the multiclass setting, the Natarajan dimension has long served as an analogue of VC dimension, yet the corresponding Sauer-type bounds are suboptimal for alphabet sizes $k>2$. In this work, we establish a sharp Sauer inequality for multiclass and list prediction. Our bound is expressed in terms of the Daniely--Shalev-Shwartz (DS) dimension, and more generally with its extension, the list-DS dimension -- the combinatorial parameters that characterize multiclass and list PAC learnability. Our bound is tight for every alphabet size $k$, list size $\ell$, and dimension value, replacing the exponential dependence on $\ell$ in the Natarajan-based bound by the optimal polynomial dependence, and improving the dependence on $k$ as well. Our proof uses the polynomial method. In contrast to the classical VC case, where several direct combinatorial proofs are known, we are not aware of any purely combinatorial proof in the DS setting. This motivates several directions for future research, which are discussed in the paper. As consequences, we obtain improved sample complexity upper bounds for list PAC learning and for uniform convergence of list predictors, sharpening the recent results of Charikar et al.~(STOC~2023), Hanneke et al.~(COLT~2024), and Brukhim et al.~(NeurIPS~2024).
翻译:Sauer-Shelah-Perles 引理是组合学和学习理论的基石,它通过Vapnik-Chervonenkis (VC) 维度来限制二元假设类的大小。对于 $k$ 元字母表上的函数类,即多类设定,Natarajan 维度长期以来一直充当 VC 维度的类比,然而相应的 Sauer 类型界对于字母表大小 $k>2$ 时是次优的。在这项工作中,我们为多类和列表预测建立了一个精确的 Sauer 不等式。我们的界以 Daniely--Shalev-Shwartz (DS) 维度来表示,更一般地,以其扩展形式,即列表-DS 维度来表示——这些是刻画多类和列表 PAC 可学习性的组合参数。我们的界对每个字母表大小 $k$、列表大小 $\ell$ 和维度值都是紧的,它将基于 Natarajan 的界中对 $\ell$ 的指数依赖替换为最优的多项式依赖,并同时改进了对 $k$ 的依赖。我们的证明使用了多项式方法。与经典的 VC 情形(已知有多种直接组合证明)不同,我们不知道在 DS 设定中有任何纯粹的组合证明。这推动了未来研究的几个方向,论文中对此进行了讨论。作为结果,我们获得了列表 PAC 学习和列表预测器一致收敛性的改进样本复杂度上界,显著优化了 Charikar 等人 (STOC 2023)、Hanneke 等人 (COLT 2024) 和 Brukhim 等人 (NeurIPS 2024) 的最新结果。