Open-vocabulary Extreme Multi-label Classification (OXMC) extends traditional XMC by allowing prediction beyond an extremely large, predefined label set (typically $10^3$ to $10^{12}$ labels), addressing the dynamic nature of real-world labeling tasks. However, self-selection bias in data annotation leads to significant missing labels in both training and test data, particularly for less popular inputs. This creates two critical challenges: generation models learn to be "lazy'" by under-generating labels, and evaluation becomes unreliable due to insufficient annotation in the test set. In this work, we introduce Positive-Unlabeled Sequence Learning (PUSL), which reframes OXMC as an infinite keyphrase generation task, addressing the generation model's laziness. Additionally, we propose to adopt a suite of evaluation metrics, F1@$\mathcal{O}$ and newly proposed B@$k$, to reliably assess OXMC models with incomplete ground truths. In a highly imbalanced e-commerce dataset with substantial missing labels, PUSL generates 30% more unique labels, and 72% of its predictions align with actual user queries. On the less skewed EURLex-4.3k dataset, PUSL demonstrates superior F1 scores, especially as label counts increase from 15 to 30. Our approach effectively tackles both the modeling and evaluation challenges in OXMC with missing labels.
翻译:开放词汇极端多标签分类(OXMC)通过允许预测超出极大预定义标签集(通常为 $10^3$ 至 $10^{12}$ 个标签),扩展了传统的XMC,以应对现实世界标注任务的动态特性。然而,数据标注中的自选择偏差导致训练和测试数据中存在大量缺失标签,尤其对于不太流行的输入。这带来了两个关键挑战:生成模型学会通过少生成标签而变得“惰性”,以及由于测试集中标注不足导致评估变得不可靠。在本工作中,我们引入了正-无标记序列学习(PUSL),它将OXMC重新构建为一个无限关键词生成任务,以解决生成模型的惰性问题。此外,我们建议采用一套评估指标,包括F1@$\mathcal{O}$和新提出的B@$k$,以在真实标注不完整的情况下可靠地评估OXMC模型。在一个标签高度不平衡且存在大量缺失的电商数据集中,PUSL生成了多出30%的唯一标签,并且其72%的预测与实际用户查询相符。在偏斜程度较低的EURLex-4.3k数据集上,PUSL表现出更优的F1分数,尤其是在标签数量从15增加到30时。我们的方法有效地解决了存在缺失标签的OXMC中的建模与评估挑战。