On the Conversational Persuasiveness of Large Language Models: A Randomized Controlled Trial

The development and popularization of large language models (LLMs) have raised concerns that they will be used to create tailor-made, convincing arguments to push false or misleading narratives online. Early work has found that language models can generate content perceived as at least on par and often more persuasive than human-written messages. However, there is still limited knowledge about LLMs' persuasive capabilities in direct conversations with human counterparts and how personalization can improve their performance. In this pre-registered study, we analyze the effect of AI-driven persuasion in a controlled, harmless setting. We create a web-based platform where participants engage in short, multiple-round debates with a live opponent. Each participant is randomly assigned to one of four treatment conditions, corresponding to a two-by-two factorial design: (1) Games are either played between two humans or between a human and an LLM; (2) Personalization might or might not be enabled, granting one of the two players access to basic sociodemographic information about their opponent. We found that participants who debated GPT-4 with access to their personal information had 81.7% (p < 0.01; N=820 unique participants) higher odds of increased agreement with their opponents compared to participants who debated humans. Without personalization, GPT-4 still outperforms humans, but the effect is lower and statistically non-significant (p=0.31). Overall, our results suggest that concerns around personalization are meaningful and have important implications for the governance of social media and the design of new online environments.

翻译：大型语言模型（LLM）的开发与普及引发了担忧——它们可能被用于生成量身定制、令人信服的论点，在网络上传播虚假或误导性叙事。早期研究发现，语言模型生成的内容至少与人类撰写的消息相当，且往往更具说服力。然而，关于LLM在与人类对手直接对话中的说服能力，以及个性化如何提升其表现，目前仍了解有限。在这项预先注册的研究中，我们分析了在受控、无害环境下AI驱动说服的效果。我们创建了一个基于网络的平台，让参与者与实时对手进行简短的多轮辩论。每位参与者被随机分配到四种实验条件之一，对应双因素析因设计：（1）辩论在两人之间或一人与LLM之间进行；（2）个性化功能可能启用或禁用，允许其中一名玩家访问对手的基本社会人口统计信息。我们发现，与人类辩论的参与者相比，那些与可访问其个人信息的GPT-4辩论的参与者，其观点一致性提升的几率高出81.7%（p < 0.01；N=820名独立参与者）。在无个性化条件下，GPT-4仍优于人类，但效果较低且统计不显著（p=0.31）。总体而言，我们的结果表明，围绕个性化的担忧具有实际意义，并对社交媒体治理及新型在线环境的设计具有重要启示。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/