Intent detection is a critical component of task-oriented dialogue systems (TODS) which enables the identification of suitable actions to address user utterances at each dialog turn. Traditional approaches relied on computationally efficient supervised sentence transformer encoder models, which require substantial training data and struggle with out-of-scope (OOS) detection. The emergence of generative large language models (LLMs) with intrinsic world knowledge presents new opportunities to address these challenges. In this work, we adapt 7 SOTA LLMs using adaptive in-context learning and chain-of-thought prompting for intent detection, and compare their performance with contrastively fine-tuned sentence transformer (SetFit) models to highlight prediction quality and latency tradeoff. We propose a hybrid system using uncertainty based routing strategy to combine the two approaches that along with negative data augmentation results in achieving the best of both worlds ( i.e. within 2% of native LLM accuracy with 50% less latency). To better understand LLM OOS detection capabilities, we perform controlled experiments revealing that this capability is significantly influenced by the scope of intent labels and the size of the label space. We also introduce a two-step approach utilizing internal LLM representations, demonstrating empirical gains in OOS detection accuracy and F1-score by >5% for the Mistral-7B model.
翻译:意图检测是面向任务对话系统(TODS)的关键组成部分,它能够在每个对话轮次识别合适的动作以响应用户话语。传统方法依赖于计算高效的监督式句子Transformer编码器模型,这些模型需要大量训练数据,并且在处理超出范围(OOS)检测时存在困难。具有内在世界知识的生成式大语言模型(LLMs)的出现为解决这些挑战提供了新的机遇。在本研究中,我们采用自适应上下文学习和思维链提示技术,对7种最先进的大语言模型进行适配,用于意图检测,并将其性能与对比微调的句子Transformer(SetFit)模型进行比较,以凸显预测质量与延迟之间的权衡。我们提出了一种混合系统,该系统采用基于不确定性的路由策略来结合这两种方法,并结合负数据增强技术,从而实现了两全其美的效果(即在延迟减少50%的情况下,准确率与大语言模型原生准确率的差距在2%以内)。为了更好地理解大语言模型的OOS检测能力,我们进行了对照实验,结果表明该能力显著受到意图标签范围和标签空间大小的影响。我们还介绍了一种利用大语言模型内部表示的两步方法,实验证明该方法使Mistral-7B模型的OOS检测准确率和F1分数提升了超过5%。