Extreme multi-label text classification utilizes the label hierarchy to partition extreme labels into multiple label groups, turning the task into simple multi-group multi-label classification tasks. Current research encodes labels as a vector with fixed length which needs establish multiple classifiers for different label groups. The problem is how to build only one classifier without sacrificing the label relationship in the hierarchy. This paper adopts the multi-answer questioning task for extreme multi-label classification. This paper also proposes an auxiliary classification evaluation metric. This study adopts the proposed method and the evaluation metric to the legal domain. The utilization of legal Berts and the study on task distribution are discussed. The experiment results show that the proposed hierarchy and multi-answer questioning task can do extreme multi-label classification for EURLEX dataset. And in minor/fine-tuning the multi-label classification task, the domain adapted BERT models could not show apparent advantages in this experiment. The method is also theoretically applicable to zero-shot learning.
翻译:极端多标签文本分类利用标签层次结构将极端标签划分为多个标签组,从而将任务转化为简单的多组多标签分类任务。当前研究将标签编码为固定长度的向量,这需要为不同标签组建立多个分类器。问题在于如何在仅构建一个分类器的情况下,不牺牲层次结构中的标签关联性。本文采用多答案问答任务实现极端多标签分类,并提出了辅助分类评估指标。本研究将所提方法与评估指标应用于法律领域,探讨了法律BERT的使用及任务分布相关问题。实验结果表明,所提出的层次结构与多答案问答任务能够在EURLEX数据集上实现极端多标签分类。在微调多标签分类任务中,领域适配的BERT模型在此实验中未显示显著优势。该方法在理论上也适用于零样本学习。