Fine-Tuning Causal LLMs for Text Classification: Embedding-Based vs. Instruction-Based Approaches

We explore efficient strategies to fine-tune decoder-only Large Language Models (LLMs) for downstream text classification under resource constraints. Two approaches are investigated: (1) attaching a classification head to a pretrained causal LLM and fine-tuning it on the task, using the LLM's final-token embedding as a sequence representation, and (2) instruction-tuning the LLM in a prompt-to-response format for classification. To enable single-GPU fine-tuning of models up to 8B parameters, we combine 4-bit model quantization with Low-Rank Adaptation (LoRA) for parameter-efficient training. Experiments on two patent benchmarks, a 5-class single-label internal corpus and the public WIPO-Alpha multi-label dataset with 14 categories, show that the embedding-head approach matches or exceeds fine-tuned BERT baselines on single-label classification while training 10-30x fewer parameters. Instruction-tuning is competitive only in the multi-label regime, and only with substantially larger trainable budgets of at least 100M parameters. These results demonstrate that directly leveraging the internal representations of causal LLMs, together with efficient fine-tuning techniques, yields strong classification performance under limited computational resources. We discuss the advantages of each approach and outline practical guidelines and future directions for optimizing LLM fine-tuning in classification scenarios.

翻译：我们探索了在资源受限条件下，针对下游文本分类任务高效微调仅解码器大语言模型（LLM）的策略。研究了两种方法：（1）为预训练因果LLM附加分类头并在任务上进行微调，利用LLM的最终token嵌入作为序列表示；（2）以提示-响应的格式对LLM进行指令微调以完成分类。为了实现单GPU上对高达8B参数模型的微调，我们将4位模型量化与低秩适应（LoRA）相结合，进行参数高效训练。在两个专利基准测试（一个5类单标签内部语料库和具有14个类别的公开WIPO-Alpha多标签数据集）上的实验表明，在单标签分类中，嵌入头方法在训练参数少10-30倍的情况下，能够达到甚至超越微调后的BERT基线。指令微调仅在多标签场景下具有竞争力，并且需要至少1亿参数的大幅增加的可训练预算。这些结果表明，直接利用因果LLM的内部表示，结合高效的微调技术，在有限计算资源下能够产生强大的分类性能。我们讨论了每种方法的优势，并概述了在分类场景中优化LLM微调的实用指南和未来方向。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

大型语言模型遇上文本属性图：一种融合框架与应用的综述

专知会员服务

10+阅读 · 2025年10月27日

【新书】使用大型语言模型进行数据分析：文本、表格、图像与音频

专知会员服务

43+阅读 · 2025年4月16日

【新书】设计大型语言模型应用：一种面向LLMs的整体方法

专知会员服务

56+阅读 · 2025年3月16日

带入您自己的知识：大型语言模型（LLM）知识扩展方法综述

专知会员服务

38+阅读 · 2025年2月21日