In multiclass classification over $n$ outcomes, the outcomes must be embedded into the reals with dimension at least $n-1$ in order to design a consistent surrogate loss that leads to the "correct" classification, regardless of the data distribution. For large $n$, such as in information retrieval and structured prediction tasks, optimizing a surrogate in $n-1$ dimensions is often intractable. We investigate ways to trade off surrogate loss dimension, the number of problem instances, and restricting the region of consistency in the simplex for multiclass classification. Following past work, we examine an intuitive embedding procedure that maps outcomes into the vertices of convex polytopes in a low-dimensional surrogate space. We show that full-dimensional subsets of the simplex exist around each point mass distribution for which consistency holds, but also, with less than $n-1$ dimensions, there exist distributions for which a phenomenon called hallucination occurs, which is when the optimal report under the surrogate loss is an outcome with zero probability. Looking towards application, we derive a result to check if consistency holds under a given polytope embedding and low-noise assumption, providing insight into when to use a particular embedding. We provide examples of embedding $n = 2^{d}$ outcomes into the $d$-dimensional unit cube and $n = d!$ outcomes into the $d$-dimensional permutahedron under low-noise assumptions. Finally, we demonstrate that with multiple problem instances, we can learn the mode with $\frac{n}{2}$ dimensions over the whole simplex.
翻译:在多类别分类任务中,若需设计出与数据分布无关且能导向“正确”分类的一致性代理损失函数,则必须将$n$个类别嵌入到维度至少为$n-1$的实数空间中。对于大规模$n$的场景(如信息检索与结构化预测任务),在$n-1$维空间中优化代理损失往往不可行。本文研究如何在多分类问题中权衡代理损失维度、问题实例数量以及单纯形上的一致性区域限制。基于前人工作,我们探究一种直观的嵌入方法:将类别映射至低维代理空间中凸多面体的顶点。我们证明在每个点质量分布周围存在单纯形的全维子集使得一致性成立,但同时发现当维度小于$n-1$时,存在某些分布会出现“幻觉”现象——即代理损失下的最优预测结果对应零概率类别。在应用层面,我们推导出在给定多面体嵌入和低噪声假设下检验一致性是否成立的判定方法,为特定嵌入策略的选择提供依据。通过实例展示了在低噪声假设下将$n = 2^{d}$个类别嵌入$d$维单位超立方体,以及将$n = d!$个类别嵌入$d$维排列多面体的可行性。最后,我们证明通过多个问题实例的联合学习,可在整个单纯形上以$\frac{n}{2}$维空间实现众数学习。