Label embedding is a framework for multiclass classification problems where each label is represented by a distinct vector of some fixed dimension, and training involves matching model output to the vector representing the correct label. While label embedding has been successfully applied in extreme classification and zero-shot learning, and offers both computational and statistical advantages, its theoretical foundations remain poorly understood. This work presents an analysis of label embedding in the context of extreme multiclass classification, where the number of classes $C$ is very large. We present an excess risk bound that reveals a trade-off between computational and statistical efficiency, quantified via the coherence of the embedding matrix. We further show that under the Massart noise condition, the statistical penalty for label embedding vanishes with sufficiently low coherence. Our analysis supports an algorithm that is simple, scalable, and easily parallelizable, and experimental results demonstrate its effectiveness in large-scale applications.
翻译:标签嵌入是一种用于多类分类问题的框架,其中每个类别由一个固定维度的独特向量表示,训练过程中模型输出需与正确类别对应的向量相匹配。尽管标签嵌入已在极端分类和零样本学习中得到成功应用,并兼具计算与统计优势,但其理论基础尚不明确。本文针对类别数$C$极大的极端多类分类场景,对标签嵌入进行理论分析。我们提出了一个超额风险界,揭示了计算效率与统计效率之间的权衡,该权衡通过嵌入矩阵的相干性量化。进一步研究表明,在Massart噪声条件下,若嵌入矩阵具有足够低的相干性,标签嵌入带来的统计惩罚将消失。基于该分析,我们提出了一种简单、可扩展且易于并行化的算法,实验结果验证了其在大型应用中的有效性。