Prompt Tuning is emerging as a scalable and cost-effective method to fine-tune Pretrained Language Models (PLMs), which are often referred to as Large Language Models (LLMs). This study benchmarks the performance and computational efficiency of Prompt Tuning and baselines for multi-label text classification. This is applied to the challenging task of classifying companies into an investment firm's proprietary industry taxonomy, supporting their thematic investment strategy. Text-to-text classification is frequently reported to outperform task-specific classification heads, but has several limitations when applied to a multi-label classification problem where each label consists of multiple tokens: (a) Generated labels may not match any label in the label taxonomy; (b) The fine-tuning process lacks permutation invariance and is sensitive to the order of the provided labels; (c) The model provides binary decisions rather than appropriate confidence scores. Limitation (a) is addressed by applying constrained decoding using Trie Search, which slightly improves classification performance. All limitations (a), (b), and (c) are addressed by replacing the PLM's language head with a classification head, which is referred to as Prompt Tuned Embedding Classification (PTEC). This improves performance significantly, while also reducing computational costs during inference. In our industrial application, the training data is skewed towards well-known companies. We confirm that the model's performance is consistent across both well-known and less-known companies. Our overall results indicate the continuing need to adapt state-of-the-art methods to domain-specific tasks, even in the era of PLMs with strong generalization abilities. We release our codebase and a benchmarking dataset at https://github.com/EQTPartners/PTEC.
翻译:提示调优(Prompt Tuning)正成为一种可扩展且经济高效的方法,用于微调预训练语言模型(PLMs,常被称为大型语言模型LLMs)。本研究对提示调优及其基线方法在多标签文本分类中的性能和计算效率进行了基准测试。该方法应用于一项具有挑战性的任务:将公司分类到投资公司专有的行业分类体系中,以支持其主题投资策略。文本到文本分类虽常被报道优于特定任务的分类头,但在应用于每个标签由多个标记组成的多标签分类问题时存在若干局限性:(a)生成的标签可能与标签体系中的任何标签不匹配;(b)微调过程缺乏排列不变性,对给定标签的顺序敏感;(c)模型提供二元决策而非适当的置信度分数。局限性(a)通过使用Trie搜索施加约束解码得到解决,这略微提升了分类性能。所有局限性(a)、(b)和(c)则通过将PLM的语言头替换为分类头(称为提示调优嵌入分类,PTEC)来克服,这不仅显著提升了性能,还降低了推理期间的计算成本。在我们的工业应用中,训练数据偏向于知名公司。我们证实模型在知名和不知名公司上的性能保持一致。总体结果表明,即使在具有强泛化能力的PLM时代,仍需针对特定领域任务调整最先进方法。我们在https://github.com/EQTPartners/PTEC发布了代码库和基准测试数据集。