Towards Probing Contact Center Large Language Models

Fine-tuning large language models (LLMs) with domain-specific instructions has emerged as an effective method to enhance their domain-specific understanding. Yet, there is limited work that examines the core characteristics acquired during this process. In this study, we benchmark the fundamental characteristics learned by contact-center (CC) specific instruction fine-tuned LLMs with out-of-the-box (OOB) LLMs via probing tasks encompassing conversational, channel, and automatic speech recognition (ASR) properties. We explore different LLM architectures (Flan-T5 and Llama), sizes (3B, 7B, 11B, 13B), and fine-tuning paradigms (full fine-tuning vs PEFT). Our findings reveal remarkable effectiveness of CC-LLMs on the in-domain downstream tasks, with improvement in response acceptability by over 48% compared to OOB-LLMs. Additionally, we compare the performance of OOB-LLMs and CC-LLMs on the widely used SentEval dataset, and assess their capabilities in terms of surface, syntactic, and semantic information through probing tasks. Intriguingly, we note a relatively consistent performance of probing classifiers on the set of probing tasks. Our observations indicate that CC-LLMs, while outperforming their out-of-the-box counterparts, exhibit a tendency to rely less on encoding surface, syntactic, and semantic properties, highlighting the intricate interplay between domain-specific adaptation and probing task performance opening up opportunities to explore behavior of fine-tuned language models in specialized contexts.

翻译：通过领域特定指令微调大语言模型（LLMs）已被证明是增强其领域理解能力的有效方法。然而，目前鲜有研究深入探究此过程中习得的核心特征。本研究通过涵盖对话、信道和自动语音识别（ASR）特性的探针任务，对经联络中心（CC）特定指令微调的大语言模型与未微调（OOB）大语言模型的基础特征进行基准测试。我们探索了不同LLM架构（Flan-T5和Llama）、参数量级（3B、7B、11B、13B）以及微调范式（全量微调对比PEFT）。研究发现，CC-LLMs在领域内下游任务上表现显著提升，响应可接受度相较于OOB-LLMs提升超过48%。此外，我们通过探针任务比较了OOB-LLMs与CC-LLMs在广泛使用的SentEval数据集上的性能，并评估了其在表层、句法和语义信息方面的能力。有趣的是，我们观察到探针分类器在各类探针任务上的表现相对一致。结果表明，CC-LLMs虽在性能上超越未微调模型，但表现出对编码表层、句法和语义属性的依赖性降低，这揭示了领域适配与探针任务性能之间的复杂交互关系，为探索特定场景下微调语言模型的行为开辟了新方向。

相关内容

大语言模型

关注 67

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日