We explore efficient strategies to fine-tune decoder-only Large Language Models (LLMs) for downstream text classification under resource constraints. Two approaches are investigated: (1) attaching a classification head to a pretrained causal LLM and fine-tuning it on the task, using the LLM's final-token embedding as a sequence representation, and (2) instruction-tuning the LLM in a prompt-to-response format for classification. To enable single-GPU fine-tuning of models up to 8B parameters, we combine 4-bit model quantization with Low-Rank Adaptation (LoRA) for parameter-efficient training. Experiments on two patent benchmarks, a 5-class single-label internal corpus and the public WIPO-Alpha multi-label dataset with 14 categories, show that the embedding-head approach matches or exceeds fine-tuned BERT baselines on single-label classification while training 10-30x fewer parameters. Instruction-tuning is competitive only in the multi-label regime, and only with substantially larger trainable budgets of at least 100M parameters. These results demonstrate that directly leveraging the internal representations of causal LLMs, together with efficient fine-tuning techniques, yields strong classification performance under limited computational resources. We discuss the advantages of each approach and outline practical guidelines and future directions for optimizing LLM fine-tuning in classification scenarios.
翻译:我们探索了在资源受限条件下,针对下游文本分类任务高效微调仅解码器大语言模型(LLM)的策略。研究了两种方法:(1)为预训练因果LLM附加分类头并在任务上进行微调,利用LLM的最终token嵌入作为序列表示;(2)以提示-响应的格式对LLM进行指令微调以完成分类。为了实现单GPU上对高达8B参数模型的微调,我们将4位模型量化与低秩适应(LoRA)相结合,进行参数高效训练。在两个专利基准测试(一个5类单标签内部语料库和具有14个类别的公开WIPO-Alpha多标签数据集)上的实验表明,在单标签分类中,嵌入头方法在训练参数少10-30倍的情况下,能够达到甚至超越微调后的BERT基线。指令微调仅在多标签场景下具有竞争力,并且需要至少1亿参数的大幅增加的可训练预算。这些结果表明,直接利用因果LLM的内部表示,结合高效的微调技术,在有限计算资源下能够产生强大的分类性能。我们讨论了每种方法的优势,并概述了在分类场景中优化LLM微调的实用指南和未来方向。