In multiclass classification over $n$ outcomes, the outcomes must be embedded into the reals with dimension at least $n-1$ in order to design a consistent surrogate loss that leads to the "correct" classification, regardless of the data distribution. For large $n$, such as in information retrieval and structured prediction tasks, optimizing a surrogate in $n-1$ dimensions is often intractable. We investigate ways to trade off surrogate loss dimension, the number of problem instances, and restricting the region of consistency in the simplex for multiclass classification. Following past work, we examine an intuitive embedding procedure that maps outcomes into the vertices of convex polytopes in a low-dimensional surrogate space. We show that full-dimensional subsets of the simplex exist around each point mass distribution for which consistency holds, but also, with less than $n-1$ dimensions, there exist distributions for which a phenomenon called hallucination occurs, which is when the optimal report under the surrogate loss is an outcome with zero probability. Looking towards application, we derive a result to check if consistency holds under a given polytope embedding and low-noise assumption, providing insight into when to use a particular embedding. We provide examples of embedding $n = 2^{d}$ outcomes into the $d$-dimensional unit cube and $n = d!$ outcomes into the $d$-dimensional permutahedron under low-noise assumptions. Finally, we demonstrate that with multiple problem instances, we can learn the mode with $\frac{n}{2}$ dimensions over the whole simplex.
翻译:在$n$个结果的多类分类中,为了设计一个能导致“正确”分类且与数据分布无关的一致代理损失,这些结果必须嵌入到维数至少为$n-1$的实数空间中。当$n$很大时(例如在信息检索和结构化预测任务中),在$n-1$维空间中优化代理函数往往难以处理。本文研究了多类分类中代理损失维度、问题实例数量以及单纯形内一致性区域限制之间的权衡方法。借鉴先前工作,我们分析了一种直观的嵌入过程,该过程将结果映射到低维代理空间中凸多面体的顶点。我们证明,在每个点质量分布周围存在单纯形的全维子集,其中一致性成立;但同时,当维数小于$n-1$时,存在某些分布会导致称为幻觉的现象,即代理损失下的最优报告是一个概率为零的结果。面向实际应用,我们推导出一个结果,用于检查在给定多面体嵌入和低噪声假设下的一致性是否成立,为何时使用特定嵌入提供见解。我们在低噪声假设下给出了将$n = 2^{d}$个结果嵌入到$d$维单位立方体,以及将$n = d!$个结果嵌入到$d$维置换多面体的示例。最后,我们证明在多问题实例情况下,可以在整个单纯形上用$\frac{n}{2}$维空间学习众数。