Characterizing Manipulation from AI Systems

Manipulation is a common concern in many domains, such as social media, advertising, and chatbots. As AI systems mediate more of our interactions with the world, it is important to understand the degree to which AI systems might manipulate humans \textit{without the intent of the system designers}. Our work clarifies challenges in defining and measuring manipulation in the context of AI systems. Firstly, we build upon prior literature on manipulation from other fields and characterize the space of possible notions of manipulation, which we find to depend upon the concepts of incentives, intent, harm, and covertness. We review proposals on how to operationalize each factor. Second, we propose a definition of manipulation based on our characterization: a system is manipulative \textit{if it acts as if it were pursuing an incentive to change a human (or another agent) intentionally and covertly}. Third, we discuss the connections between manipulation and related concepts, such as deception and coercion. Finally, we contextualize our operationalization of manipulation in some applications. Our overall assessment is that while some progress has been made in defining and measuring manipulation from AI systems, many gaps remain. In the absence of a consensus definition and reliable tools for measurement, we cannot rule out the possibility that AI systems learn to manipulate humans without the intent of the system designers. We argue that such manipulation poses a significant threat to human autonomy, suggesting that precautionary actions to mitigate it are warranted.

翻译：操纵性是社交媒体、广告和聊天机器人等多个领域的常见关切。随着AI系统越来越多地介导我们与世界的交互，理解AI系统在非设计者意图下操纵人类的程度至关重要。本研究旨在厘清在AI系统语境下定义和测量操纵性面临的挑战。首先，我们借鉴其他领域关于操纵性的既有文献，系统刻画了可能的操纵性概念空间，发现其依赖激励、意图、伤害和隐蔽性四个要素。我们梳理了各要素可操作化的相关方案。其次，基于上述表征提出操纵性定义：若某系统表现出如同为改变人类（或其他代理）而有意图且隐蔽地追求激励的行为，则称该系统具有操纵性。第三，我们探讨了操纵性与欺骗、胁迫等相关概念的关联。最后，在若干应用场景中阐释了操纵性操作化定义的具体实践。总体评估表明，尽管在定义和测量AI系统操纵性方面已取得一定进展，但仍存在诸多空白。在缺乏共识性定义和可靠测量工具的情况下，无法排除AI系统可能在设计者非意图下习得操纵人类行为的可能性。我们认为此类操纵性对人类自主性构成重大威胁，因此有必要采取预防性缓解措施。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

《可解释人工智能的态势感知框架 (SAFE-AI) 和 XAI 系统的人为因素考虑》麻省理工学院17页论文

专知会员服务

106+阅读 · 2023年2月19日

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日

【MIT】从视频物理系统进行因果发现，Causal Discovery in Physical Systems from Videos

专知会员服务

26+阅读 · 2020年7月4日