A Mechanism-Based Approach to Mitigating Harms from Persuasive Generative AI

Seliem El-Sayed,Canfer Akbulut,Amanda McCroskery,Geoff Keeling,Zachary Kenton,Zaria Jalan,Nahema Marchal,Arianna Manzini,Toby Shevlane,Shannon Vallor,Daniel Susser,Matija Franklin,Sophie Bridgers,Harry Law,Matthew Rahtz,Murray Shanahan,Michael Henry Tessler,Arthur Douillard,Tom Everitt,Sasha Brown

Recent generative AI systems have demonstrated more advanced persuasive capabilities and are increasingly permeating areas of life where they can influence decision-making. Generative AI presents a new risk profile of persuasion due the opportunity for reciprocal exchange and prolonged interactions. This has led to growing concerns about harms from AI persuasion and how they can be mitigated, highlighting the need for a systematic study of AI persuasion. The current definitions of AI persuasion are unclear and related harms are insufficiently studied. Existing harm mitigation approaches prioritise harms from the outcome of persuasion over harms from the process of persuasion. In this paper, we lay the groundwork for the systematic study of AI persuasion. We first put forward definitions of persuasive generative AI. We distinguish between rationally persuasive generative AI, which relies on providing relevant facts, sound reasoning, or other forms of trustworthy evidence, and manipulative generative AI, which relies on taking advantage of cognitive biases and heuristics or misrepresenting information. We also put forward a map of harms from AI persuasion, including definitions and examples of economic, physical, environmental, psychological, sociocultural, political, privacy, and autonomy harm. We then introduce a map of mechanisms that contribute to harmful persuasion. Lastly, we provide an overview of approaches that can be used to mitigate against process harms of persuasion, including prompt engineering for manipulation classification and red teaming. Future work will operationalise these mitigations and study the interaction between different types of mechanisms of persuasion.

翻译：近期生成式AI系统展现出更高级的说服能力，并日益渗透到可能影响决策的生活领域。生成式AI因具备双向对话和持续交互的机会，呈现出新的说服风险特征。这引发了对AI说服危害及其缓解方法的日益关注，凸显了系统研究AI说服的必要性。当前AI说服的定义尚不清晰，相关危害研究也不充分。现有危害缓解方法更关注说服结果而非说服过程带来的危害。本文为AI说服的系统性研究奠定基础。我们首先提出具有说服能力的生成式AI的定义，区分了理性说服型生成式AI（依赖提供相关事实、合理推理或其他可信证据）与操纵型生成式AI（利用认知偏差与启发式方法或歪曲信息）。同时提出AI说服的危害图谱，包含经济、身体、环境、心理、社会文化、政治、隐私及自主性危害的定义与示例。随后引入导致有害说服的机制图谱。最后概述可缓解说服过程危害的方法，包括用于操纵分类的提示工程和红队测试。未来工作将实施这些缓解措施，并研究不同说服机制类型间的相互作用。

相关内容

生成式人工智能

关注 38

生成式人工智能是利用复杂的算法、模型和规则，从大规模数据集中学习，以创造新的原创内容的人工智能技术。这项技术能够创造文本、图片、声音、视频和代码等多种类型的内容，全面超越了传统软件的数据处理和分析能力。2022年末，OpenAI推出的ChatGPT标志着这一技术在文本生成领域取得了显著进展，2023年被称为生成式人工智能的突破之年。这项技术从单一的语言生成逐步向多模态、具身化快速发展。在图像生成方面，生成系统在解释提示和生成逼真输出方面取得了显著的进步。同时，视频和音频的生成技术也在迅速发展，这为虚拟现实和元宇宙的实现提供了新的途径。生成式人工智能技术在各行业、各领域都具有广泛的应用前景。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日