The extreme multi-label classification~(XMC) task involves learning a classifier that can predict from a large label set the most relevant subset of labels for a data instance. While deep neural networks~(DNNs) have demonstrated remarkable success in XMC problems, the task is still challenging because it must deal with a large number of output labels, which make the DNN training computationally expensive. This paper addresses the issue by exploring the use of random circular vectors, where each vector component is represented as a complex amplitude. In our framework, we can develop an output layer and loss function of DNNs for XMC by representing the final output layer as a fully connected layer that directly predicts a low-dimensional circular vector encoding a set of labels for a data instance. We conducted experiments on synthetic datasets to verify that circular vectors have better label encoding capacity and retrieval ability than normal real-valued vectors. Then, we conducted experiments on actual XMC datasets and found that these appealing properties of circular vectors contribute to significant improvements in task performance compared with a previous model using random real-valued vectors, while reducing the size of the output layers by up to 99%.
翻译:极端多标签分类(XMC)任务旨在学习一个分类器,该分类器能够从大规模标签集合中预测出数据实例最相关的标签子集。尽管深度神经网络(DNN)在XMC问题上已展现出显著的成功,但由于需要处理大量输出标签导致DNN训练计算成本高昂,该任务仍具挑战性。本文通过探索随机圆向量的应用来解决这一问题,其中每个向量分量表示为一个复振幅。在我们的框架中,通过将最终输出层表示为直接预测数据实例标签集的低维圆向量的全连接层,可以构建适用于XMC的DNN输出层与损失函数。我们在合成数据集上进行了实验,验证了圆向量相较于普通实值向量具有更优的标签编码能力与检索性能。随后在实际XMC数据集上的实验表明,与先前使用随机实值向量的模型相比,圆向量的这些优良特性能够显著提升任务性能,同时将输出层规模减少高达99%。