We describe a class of tasks called dialogue decision problems, in which AI assistants must collaborate with one or more humans via natural language to help them make complex decisions. We formalize three domains in which users face everyday decisions: (1) choosing an assignment of reviewers to conference papers, (2) planning a multi-step itinerary in a city, and (3) negotiating travel plans for a group of friends. In each of these settings, AI assistants and users have disparate abilities that they must combine to arrive at the best decision: assistants can access and process large amounts of information, while users have preferences and constraints external to the system. For each task, we build a dialogue environment where agents receive a reward based on the quality of the final decision they reach. Using these environments, we collect human-human dialogues with humans playing the role of assistant. To compare how current AI assistants communicate in these settings, we present baselines using large language models in self-play. Finally, we highlight a number of challenges models face in decision-oriented dialogues, ranging from efficient communication to reasoning and optimization, and release our environments as a testbed for future modeling work.
翻译:我们描述了一类称为“对话决策问题”的任务,在这类任务中,AI助手必须通过自然语言与一个或多个人类协作,帮助他们做出复杂决策。我们形式化了三个用户面临日常决策的领域:(1)为会议论文选择审稿人分配方案,(2)规划城市中的多步骤行程,以及(3)为一群朋友协商旅行计划。在每种场景中,AI助手和用户具有不同的能力,必须将其结合以达成最佳决策:助手能够访问和处理大量信息,而用户则拥有系统外部的偏好和约束。针对每项任务,我们构建了一个对话环境,其中智能体根据最终决策质量获得奖励。利用这些环境,我们收集了人类扮演助手角色时的人人对话。为了比较当前AI助手在这些场景中的沟通方式,我们展示了使用大语言模型进行自我对弈的基线结果。最后,我们强调了模型在决策导向型对话中面临的一系列挑战,包括高效沟通、推理与优化,并将这些环境作为测试平台发布,以支持未来的建模研究。