Out-of-domain (OOD) intent detection aims to examine whether the user's query falls outside the predefined domain of the system, which is crucial for the proper functioning of task-oriented dialogue (TOD) systems. Previous methods address it by fine-tuning discriminative models. Recently, some studies have been exploring the application of large language models (LLMs) represented by ChatGPT to various downstream tasks, but it is still unclear for their ability on OOD detection task.This paper conducts a comprehensive evaluation of LLMs under various experimental settings, and then outline the strengths and weaknesses of LLMs. We find that LLMs exhibit strong zero-shot and few-shot capabilities, but is still at a disadvantage compared to models fine-tuned with full resource. More deeply, through a series of additional analysis experiments, we discuss and summarize the challenges faced by LLMs and provide guidance for future work including injecting domain knowledge, strengthening knowledge transfer from IND(In-domain) to OOD, and understanding long instructions.
翻译:域外意图检测旨在判断用户查询是否超出系统预定义域,这对任务导向型对话系统的正常运行至关重要。以往方法通过微调判别模型来处理该任务。近期,部分研究开始探索以ChatGPT为代表的大语言模型在各下游任务中的应用,但其在域外意图检测任务上的能力尚不明确。本文在多种实验设置下对大语言模型进行全面评估,继而阐明其优势与不足。研究发现,大语言模型展现出强大的零样本和少样本能力,但与全资源微调模型相比仍存在差距。进一步,通过一系列附加分析实验,我们讨论并总结了大语言模型面临的挑战,为未来工作提供指导,包括注入领域知识、增强从域内到域外的知识迁移以及理解长指令。