Understanding Aggregations of Proper Learners in Multiclass Classification

Multiclass learnability is known to exhibit a properness barrier: there are learnable classes which cannot be learned by any proper learner. Binary classification faces no such barrier for learnability, but a similar one for optimal learning, which can in general only be achieved by improper learners. Fortunately, recent advances in binary classification have demonstrated that this requirement can be satisfied using aggregations of proper learners, some of which are strikingly simple. This raises a natural question: to what extent can simple aggregations of proper learners overcome the properness barrier in multiclass classification? We give a positive answer to this question for classes which have finite Graph dimension, $d_G$. Namely, we demonstrate that the optimal binary learners of Hanneke, Larsen, and Aden-Ali et al. (appropriately generalized to the multiclass setting) achieve sample complexity $O\left(\frac{d_G + \ln(1 / \delta)}{\epsilon}\right)$. This forms a strict improvement upon the sample complexity of ERM. We complement this with a lower bound demonstrating that for certain classes of Graph dimension $d_G$, majorities of ERM learners require $\Omega \left( \frac{d_G + \ln(1 / \delta)}{\epsilon}\right)$ samples. Furthermore, we show that a single ERM requires $\Omega \left(\frac{d_G \ln(1 / \epsilon) + \ln(1 / \delta)}{\epsilon}\right)$ samples on such classes, exceeding the lower bound of Daniely et al. (2015) by a factor of $\ln(1 / \epsilon)$. For multiclass learning in full generality -- i.e., for classes of finite DS dimension but possibly infinite Graph dimension -- we give a strong refutation to these learning strategies, by exhibiting a learnable class which cannot be learned to constant error by any aggregation of a finite number of proper learners.

翻译：多类可学习性已知存在一个适当性障碍：存在一些可学习类别，无法被任何适当学习器学习。二元分类在可学习性方面不存在此类障碍，但在最优学习方面存在类似障碍，通常只能通过非适当学习器实现。幸运的是，二元分类领域的最新进展表明，这一要求可以通过适当学习器的聚合来满足，其中一些聚合方法异常简洁。这引发了一个自然问题：简单聚合的适当学习器能在多大程度上克服多类分类中的适当性障碍？针对具有有限图维度 $d_G$ 的类别，我们给出了肯定答案。具体而言，我们证明了 Hanneke、Larsen 及 Aden-Ali 等人提出的最优二元学习器（经适当推广至多类场景）能够实现 $O\left(\frac{d_G + \ln(1 / \delta)}{\epsilon}\right)$ 的样本复杂度。这相较于经验风险最小化（ERM）的样本复杂度形成了严格改进。我们通过下界结果补充说明：对于某些图维度为 $d_G$ 的类别，ERM 学习器的多数投票需要 $\Omega \left( \frac{d_G + \ln(1 / \delta)}{\epsilon}\right)$ 个样本。此外，我们证明在此类类别上，单个 ERM 需要 $\Omega \left(\frac{d_G \ln(1 / \epsilon) + \ln(1 / \delta)}{\epsilon}\right)$ 个样本，这比 Daniely 等人（2015）的下界高出 $\ln(1 / \epsilon)$ 倍。对于完全广义的多类学习——即针对具有有限 DS 维度但可能具有无限图维度的类别——我们通过展示一个可学习类别无法被任何有限数量适当学习器的聚合以恒定误差学习，对这些学习策略给出了有力反驳。