Multi-Head Encoding for Extreme Label Classification

The number of categories of instances in the real world is normally huge, and each instance may contain multiple labels. To distinguish these massive labels utilizing machine learning, eXtreme Label Classification (XLC) has been established. However, as the number of categories increases, the number of parameters and nonlinear operations in the classifier also rises. This results in a Classifier Computational Overload Problem (CCOP). To address this, we propose a Multi-Head Encoding (MHE) mechanism, which replaces the vanilla classifier with a multi-head classifier. During the training process, MHE decomposes extreme labels into the product of multiple short local labels, with each head trained on these local labels. During testing, the predicted labels can be directly calculated from the local predictions of each head. This reduces the computational load geometrically. Then, according to the characteristics of different XLC tasks, e.g., single-label, multi-label, and model pretraining tasks, three MHE-based implementations, i.e., Multi-Head Product, Multi-Head Cascade, and Multi-Head Sampling, are proposed to more effectively cope with CCOP. Moreover, we theoretically demonstrate that MHE can achieve performance approximately equivalent to that of the vanilla classifier by generalizing the low-rank approximation problem from Frobenius-norm to Cross-Entropy. Experimental results show that the proposed methods achieve state-of-the-art performance while significantly streamlining the training and inference processes of XLC tasks. The source code has been made public at https://github.com/Anoise/MHE.

翻译：现实世界中实例的类别数量通常极为庞大，且每个实例可能包含多个标签。为利用机器学习区分这些海量标签，极端标签分类领域应运而生。然而，随着类别数量的增加，分类器中的参数量和非线性运算量也随之增长，从而引发分类器计算过载问题。为解决该问题，本文提出一种多头编码机制，该机制使用多头分类器替代原始分类器。在训练过程中，多头编码将极端标签分解为多个短局部标签的乘积，每个头基于这些局部标签进行训练。在测试阶段，预测标签可直接通过各头的局部预测结果计算得出，从而实现计算量的几何级降低。随后，针对不同极端标签分类任务的特点（如单标签、多标签及模型预训练任务），本文提出了三种基于多头编码的实现方案——多头乘积、多头级联与多头采样，以更有效地应对分类器计算过载问题。此外，本文通过将低秩近似问题从弗罗贝尼乌斯范数推广至交叉熵，从理论上证明了多头编码能够取得与原始分类器相近的性能。实验结果表明，所提方法在显著精简极端标签分类任务训练与推理过程的同时，取得了最先进的性能表现。源代码已公开于 https://github.com/Anoise/MHE。