Text Classification via Large Language Models

Despite the remarkable success of large-scale Language Models (LLMs) such as GPT-3, their performances still significantly underperform fine-tuned models in the task of text classification. This is due to (1) the lack of reasoning ability in addressing complex linguistic phenomena (e.g., intensification, contrast, irony etc); (2) limited number of tokens allowed in in-context learning. In this paper, we introduce \textbf{C}lue \textbf{A}nd \textbf{R}easoning \textbf{P}rompting (CARP). CARP adopts a progressive reasoning strategy tailored to addressing the complex linguistic phenomena involved in text classification: CARP first prompts LLMs to find superficial clues (e.g., keywords, tones, semantic relations, references, etc), based on which a diagnostic reasoning process is induced for final decisions. To further address the limited-token issue, CARP uses a fine-tuned model on the supervised dataset for $k$NN demonstration search in the in-context learning, allowing the model to take the advantage of both LLM's generalization ability and the task-specific evidence provided by the full labeled dataset. Remarkably, CARP yields new SOTA performances on 4 out of 5 widely-used text-classification benchmarks, 97.39 (+1.24) on SST-2, 96.40 (+0.72) on AGNews, 98.78 (+0.25) on R8 and 96.95 (+0.6) on R52, and a performance comparable to SOTA on MR (92.39 v.s. 93.3). More importantly, we find that CARP delivers impressive abilities on low-resource and domain-adaptation setups. Specifically, Specifically, using 16 examples per class, CARP achieves comparable performances to supervised models with 1,024 examples per class.

翻译：尽管GPT-3等大规模语言模型取得了显著成功，但在文本分类任务中，其性能仍明显低于微调模型。原因在于：（1）模型缺乏处理复杂语言现象（如强化、对比、反讽等）的推理能力；（2）上下文学习中允许的标记数量有限。本文提出**线索与推理提示**（CARP）。CARP采用渐进推理策略，专门应对文本分类中的复杂语言现象：首先引导LLMs发现表面线索（如关键词、语气、语义关系、指代等），在此基础上诱导诊断性推理过程以作出最终决策。为解决标记数量限制问题，CARP在监督数据集上使用微调模型进行上下文学习中的k近邻演示搜索，使模型既能发挥LLM的泛化能力，又能利用完整标注数据集提供的任务特定证据。值得注意的是，在5个广泛使用的文本分类基准中，CARP在4个基准上取得了新的最优性能：SST-2为97.39%（+1.24%）、AGNews为96.40%（+0.72%）、R8为98.78%（+0.25%）、R52为96.95%（+0.6%），在MR上达到与SOTA相当的性能（92.39%对93.3%）。更重要的是，我们发现CARP在低资源设置和领域自适应场景中展现出卓越能力：具体而言，每类仅使用16个样本时，CARP即可达到每类1024个样本的监督模型性能。