Large language models (LLMs) have been used for diverse tasks in natural language processing (NLP), yet remain under-explored for task-oriented dialogue systems (TODS), especially for end-to-end TODS. We present InstructTODS, a novel off-the-shelf framework for zero-shot end-to-end task-oriented dialogue systems that can adapt to diverse domains without fine-tuning. By leveraging LLMs, InstructTODS generates a proxy belief state that seamlessly translates user intentions into dynamic queries for efficient interaction with any KB. Our extensive experiments demonstrate that InstructTODS achieves comparable performance to fully fine-tuned TODS in guiding dialogues to successful completion without prior knowledge or task-specific data. Furthermore, a rigorous human evaluation of end-to-end TODS shows that InstructTODS produces dialogue responses that notably outperform both the gold responses and the state-of-the-art TODS in terms of helpfulness, informativeness, and humanness. Moreover, the effectiveness of LLMs in TODS is further supported by our comprehensive evaluations on TODS subtasks: dialogue state tracking, intent classification, and response generation. Code and implementations could be found here https://github.com/WillyHC22/InstructTODS/
翻译:大型语言模型(LLM)已广泛应用于自然语言处理(NLP)中的多样化任务,但在任务型对话系统(TODS)中,尤其是端到端TODS中的应用仍鲜有探索。我们提出InstructTODS,一种新颖的即用型框架,用于零样本端到端任务型对话系统,可在无需微调的情况下适应多样化领域。通过利用LLM,InstructTODS生成代理信念状态,将用户意图无缝转化为动态查询,从而高效地与任何知识库(KB)交互。广泛实验表明,InstructTODS在无需先验知识或任务特定数据的情况下,引导对话成功完成的能力可与完全微调的TODS相媲美。此外,针对端到端TODS的严格人工评估显示,InstructTODS生成的对话响应在有用性、信息量和人性化程度上显著优于黄金响应及现有最优TODS。同时,我们在TODS子任务(对话状态跟踪、意图分类和响应生成)上的综合评估进一步支持了LLM在TODS中的有效性。代码与实现详见:https://github.com/WillyHC22/InstructTODS/