Multi-label classification problems with thousands of classes are hard to solve with in-context learning alone, as language models (LMs) might lack prior knowledge about the precise classes or how to assign them, and it is generally infeasible to demonstrate every class in a prompt. We propose a general program, $\texttt{Infer--Retrieve--Rank}$, that defines multi-step interactions between LMs and retrievers to efficiently tackle such problems. We implement this program using the $\texttt{DSPy}$ programming model, which specifies in-context systems in a declarative manner, and use $\texttt{DSPy}$ optimizers to tune it towards specific datasets by bootstrapping only tens of few-shot examples. Our primary extreme classification program, optimized separately for each task, attains state-of-the-art results across three benchmarks (HOUSE, TECH, TECHWOLF). We apply the same program to a benchmark with vastly different characteristics and attain competitive performance as well (BioDEX). Unlike prior work, our proposed solution requires no finetuning, is easily applicable to new tasks, alleviates prompt engineering, and requires only tens of labeled examples. Our code is public at https://github.com/KarelDO/xmc.dspy.
翻译:包含数千个类别的多标签分类问题难以仅通过上下文学习解决,因为语言模型可能缺乏关于具体类别或如何分配类别的先验知识,且通常无法在提示中展示所有类别。我们提出通用程序$\texttt{Infer--Retrieve--Rank}$,通过定义语言模型与检索器之间的多步交互来高效处理此类问题。我们采用$\texttt{DSPy}$编程模型实现该程序——该模型以声明方式指定上下文系统,并利用$\texttt{DSPy}$优化器通过仅引导数十个少样本示例来针对特定数据集进行调优。针对每项任务单独优化的主要极端分类程序,在HOUSE、TECH和TECHWOLF三个基准测试中均达到最先进结果。将该程序应用于特征差异显著的基准测试(BioDEX)时,同样获得具有竞争力的性能。与现有方法不同,我们的解决方案无需微调,可轻松适配新任务,避免提示工程,且仅需数十个标注样本。代码已开源在https://github.com/KarelDO/xmc.dspy。