While the optimal sample complexity of binary classification in terms of the VC dimension is well-established, determining the optimal sample complexity of multiclass classification has remained open. The appropriate complexity parameter for multiclass classification is the DS dimension, and despite significant efforts, a gap of $\sqrt{\text{DS}}$ has persisted between the upper and lower bounds on sample complexity. Recent work by Hanneke et al. (2026) shows a novel algebraic characterization of multiclass hypothesis classes in terms of their DS dimension. Building up on this, we show that the maximum hypergraph density of any multiclass hypothesis class is upper-bounded by its DS dimension. This proves a longstanding conjecture of Daniely and Shalev-Shwartz (2014). As a consequence, we determine the optimal dependence of the sample complexity on the DS dimension for multiclass as well as list learning.
翻译:尽管基于VC维的二分类最优样本复杂度已经得到充分确立,但确定多类分类的最优样本复杂度仍是一个开放性问题。多类分类的适当复杂度参数是DS维,尽管进行了大量研究,样本复杂度的上界与下界之间仍存在$\sqrt{\text{DS}}$的差距。Hanneke等人(2026)的最新研究展示了多类假设类在其DS维方面的一种新代数刻画。基于此,我们证明了任何多类假设类的最大超图密度均受其DS维上界限制。这证实了Daniely和Shalev-Shwartz(2014)的长期猜想。作为推论,我们确定了多类分类以及列表学习中样本复杂度对DS维的最优依赖关系。