We describe a class of tasks called decision-oriented dialogues, in which AI assistants must collaborate with one or more humans via natural language to help them make complex decisions. We formalize three domains in which users face everyday decisions: (1) choosing an assignment of reviewers to conference papers, (2) planning a multi-step itinerary in a city, and (3) negotiating travel plans for a group of friends. In each of these settings, AI assistants and users have disparate abilities that they must combine to arrive at the best decision: assistants can access and process large amounts of information, while users have preferences and constraints external to the system. For each task, we build a dialogue environment where agents receive a reward based on the quality of the final decision they reach. Using these environments, we collect human-human dialogues with humans playing the role of assistant. To compare how current AI assistants communicate in these settings, we present baselines using large language models in self-play. Finally, we highlight a number of challenges models face in decision-oriented dialogues, ranging from efficient communication to reasoning and optimization, and release our environments as a testbed for future modeling work.
翻译:我们描述了一类名为决策导向对话的任务,其中AI助手必须通过自然语言与一个或多个人类协作,以帮助他们做出复杂决策。我们正式定义了用户在日常决策中面临的三个领域:(1)为会议论文选择审稿人分配方案,(2)规划城市中的多步骤行程,(3)为一群朋友协商旅行计划。在每种场景中,AI助手和用户具有不同的能力,必须结合这些能力才能达成最优决策:助手可以访问和处理大量信息,而用户则拥有系统外部的偏好和约束条件。针对每个任务,我们构建了一个对话环境,其中智能体根据最终决策质量获得奖励。利用这些环境,我们收集了人类扮演助手角色时的人类-人类对话。为了比较当前AI助手在这些场景中的沟通方式,我们展示了使用大语言模型进行自对弈的基线结果。最后,我们强调了模型在决策导向对话中面临的一系列挑战,从高效沟通到推理与优化,并将我们的环境作为未来建模工作的测试平台发布。