AI alignment is about ensuring AI systems only pursue goals and activities that are beneficial to humans. Most of the current approach to AI alignment is to learn what humans value from their behavioural data. This paper proposes a different way of looking at the notion of alignment, namely by introducing AI Alignment Dialogues: dialogues with which users and agents try to achieve and maintain alignment via interaction. We argue that alignment dialogues have a number of advantages in comparison to data-driven approaches, especially for behaviour support agents, which aim to support users in achieving their desired future behaviours rather than their current behaviours. The advantages of alignment dialogues include allowing the users to directly convey higher-level concepts to the agent, and making the agent more transparent and trustworthy. In this paper we outline the concept and high-level structure of alignment dialogues. Moreover, we conducted a qualitative focus group user study from which we developed a model that describes how alignment dialogues affect users, and created design suggestions for AI alignment dialogues. Through this we establish foundations for AI alignment dialogues and shed light on what requires further development and research.
翻译:AI对齐旨在确保人工智能系统仅追求对人类有益的目標与行为。当前主流方法多从人类行为数据中学习其价值取向。本文提出一种不同的对齐概念视角,即引入"AI对齐对话":通过用户与智能体之间的互动对话建立并维持对齐。我们认为,相较于数据驱动方法,对齐对话具有多项优势,尤其适用于行为支持型智能体——这类智能体旨在协助用户达成期望的未来行为,而非当前行为。对齐对话的优势包括:允许用户直接向智能体传达高层次概念,以及提升智能体的透明度和可信赖性。本文阐述了对齐对话的核心概念与高层架构,并通过一项定性焦点小组用户研究,构建了描述对齐对话如何影响用户的模型,同时提出AI对齐对话的设计建议。借此,我们为AI对齐对话奠定基础,并明确需要进一步探索与发展的方向。