Programming LLM-based knowledge and task assistants that faithfully conform to developer-provided policies is challenging. These agents must retrieve and provide consistent, accurate, and relevant information to address user's queries and needs. Yet such agents generate unfounded responses ("hallucinate"). Traditional dialogue trees can only handle a limited number of conversation flows, making them inherently brittle. To this end, we present KITA - a programmable framework for creating task-oriented conversational agents that are designed to handle complex user interactions. Unlike LLMs, KITA provides reliable grounded responses, with controllable agent policies through its expressive specification, KITA Worksheet. In contrast to dialog trees, it is resilient to diverse user queries, helpful with knowledge sources, and offers ease of programming policies through its declarative paradigm. Through a real-user study involving 62 participants, we show that KITA beats the GPT-4 with function calling baseline by 26.1, 22.5, and 52.4 points on execution accuracy, dialogue act accuracy, and goal completion rate, respectively. We also release 22 real-user conversations with KITA manually corrected to ensure accuracy.
翻译:编程基于LLM的知识与任务助手,使其忠实地遵循开发者提供的策略,是一项具有挑战性的任务。这类智能体必须检索并提供一致、准确且相关的信息,以满足用户的查询与需求。然而,此类智能体常会产生缺乏依据的回应(即“幻觉”)。传统的对话树只能处理有限数量的对话流程,本质上具有脆弱性。为此,我们提出了KITA——一个可编程框架,用于创建面向任务的对话智能体,旨在处理复杂的用户交互。与LLM不同,KITA通过其富有表现力的规范——KITA工作表,提供可靠且有依据的回应,并实现可控的智能体策略。相较于对话树,KITA能灵活应对多样化的用户查询,有效利用知识源,并通过其声明式范式简化策略编程。通过一项涉及62名参与者的真实用户研究,我们证明KITA在执行准确率、对话行为准确率和目标完成率上分别比基于函数调用的GPT-4基线高出26.1、22.5和52.4个百分点。我们还发布了22段经人工校正以确保准确性的KITA真实用户对话记录。