Prompt Tuning is emerging as a scalable and cost-effective method to fine-tune Pretrained Language Models (PLMs), which are often referred to as Large Language Models (LLMs). This study benchmarks the performance and computational efficiency of Prompt Tuning and baselines for multi-label text classification. This is applied to the challenging task of classifying companies into an investment firm's proprietary industry taxonomy, supporting their thematic investment strategy. Text-to-text classification is frequently reported to outperform task-specific classification heads, but has several limitations when applied to a multi-label classification problem where each label consists of multiple tokens: (a) Generated labels may not match any label in the label taxonomy; (b) The fine-tuning process lacks permutation invariance and is sensitive to the order of the provided labels; (c) The model provides binary decisions rather than appropriate confidence scores. Limitation (a) is addressed by applying constrained decoding using Trie Search, which slightly improves classification performance. All limitations (a), (b), and (c) are addressed by replacing the PLM's language head with a classification head, which is referred to as Prompt Tuned Embedding Classification (PTEC). This improves performance significantly, while also reducing computational costs during inference. In our industrial application, the training data is skewed towards well-known companies. We confirm that the model's performance is consistent across both well-known and less-known companies. Our overall results indicate the continuing need to adapt state-of-the-art methods to domain-specific tasks, even in the era of PLMs with strong generalization abilities. We release our codebase and a benchmarking dataset at https://github.com/EQTPartners/PTEC.
翻译:提示调优(Prompt Tuning)正成为一种可扩展且经济高效的方法,用于微调预训练语言模型(PLMs),这些模型通常被称为大型语言模型(LLMs)。本研究对提示调优及基线方法在多标签文本分类中的性能和计算效率进行了基准测试,并将其应用于将公司分类至投资公司专有行业分类体系这一具有挑战性的任务,以支持其主题投资策略。文本到文本的分类通常被报道优于特定任务分类头,但在应用于每个标签由多个标记组成的多标签分类问题时存在若干局限性:(a) 生成的标签可能与标签体系中的任何标签不匹配;(b) 微调过程缺乏排列不变性,且对提供的标签顺序敏感;(c) 模型提供二元决策而非适当的置信度分数。局限性(a)通过使用Trie搜索应用约束解码得到解决,该方法略微提升了分类性能。所有局限性(a)、(b)和(c)均通过将PLM的语言头替换为分类头来解决,这被称为提示调优嵌入分类(PTEC)。该方法显著提升了性能,同时降低了推理过程中的计算成本。在我们的工业应用中,训练数据偏向于知名公司。我们确认模型在知名公司和不太知名的公司上表现一致。整体结果表明,即使在具有强泛化能力的PLM时代,仍需不断调整最先进方法以适应特定领域任务。我们在https://github.com/EQTPartners/PTEC 发布了我们的代码库和基准数据集。