Manipulation is a common concern in many domains, such as social media, advertising, and chatbots. As AI systems mediate more of our interactions with the world, it is important to understand the degree to which AI systems might manipulate humans \textit{without the intent of the system designers}. Our work clarifies challenges in defining and measuring manipulation in the context of AI systems. Firstly, we build upon prior literature on manipulation from other fields and characterize the space of possible notions of manipulation, which we find to depend upon the concepts of incentives, intent, harm, and covertness. We review proposals on how to operationalize each factor. Second, we propose a definition of manipulation based on our characterization: a system is manipulative \textit{if it acts as if it were pursuing an incentive to change a human (or another agent) intentionally and covertly}. Third, we discuss the connections between manipulation and related concepts, such as deception and coercion. Finally, we contextualize our operationalization of manipulation in some applications. Our overall assessment is that while some progress has been made in defining and measuring manipulation from AI systems, many gaps remain. In the absence of a consensus definition and reliable tools for measurement, we cannot rule out the possibility that AI systems learn to manipulate humans without the intent of the system designers. We argue that such manipulation poses a significant threat to human autonomy, suggesting that precautionary actions to mitigate it are warranted.
翻译:操纵性是社交媒体、广告和聊天机器人等多个领域的常见关切。随着AI系统越来越多地介导我们与世界的交互,理解AI系统在非设计者意图下操纵人类的程度至关重要。本研究旨在厘清在AI系统语境下定义和测量操纵性面临的挑战。首先,我们借鉴其他领域关于操纵性的既有文献,系统刻画了可能的操纵性概念空间,发现其依赖激励、意图、伤害和隐蔽性四个要素。我们梳理了各要素可操作化的相关方案。其次,基于上述表征提出操纵性定义:若某系统表现出如同为改变人类(或其他代理)而有意图且隐蔽地追求激励的行为,则称该系统具有操纵性。第三,我们探讨了操纵性与欺骗、胁迫等相关概念的关联。最后,在若干应用场景中阐释了操纵性操作化定义的具体实践。总体评估表明,尽管在定义和测量AI系统操纵性方面已取得一定进展,但仍存在诸多空白。在缺乏共识性定义和可靠测量工具的情况下,无法排除AI系统可能在设计者非意图下习得操纵人类行为的可能性。我们认为此类操纵性对人类自主性构成重大威胁,因此有必要采取预防性缓解措施。