The remarkable and ever-increasing capabilities of Large Language Models (LLMs) have raised concerns about their potential misuse for creating personalized, convincing misinformation and propaganda. To gain insights into LLMs' persuasive capabilities without directly engaging in experimentation with humans, we propose studying their performance on the related task of detecting convincing arguments. We extend a dataset by Durmus & Cardie (2018) with debates, votes, and user traits and propose tasks measuring LLMs' ability to (1) distinguish between strong and weak arguments, (2) predict stances based on beliefs and demographic characteristics, and (3) determine the appeal of an argument to an individual based on their traits. We show that LLMs perform on par with humans in these tasks and that combining predictions from different LLMs yields significant performance gains, even surpassing human performance. The data and code released with this paper contribute to the crucial ongoing effort of continuously evaluating and monitoring the rapidly evolving capabilities and potential impact of LLMs.
翻译:大型语言模型(LLMs)日益增强的能力引发了对其可能被滥用于制造个性化、有说服力的错误信息和宣传的担忧。为了在不直接进行人类实验的情况下洞察LLMs的说服能力,我们提出研究它们在检测有说服力论点这一相关任务上的表现。我们基于Durmus & Cardie(2018)的数据集进行了扩展,加入了辩论、投票和用户特征,并提出了衡量LLMs能力的任务:(1)区分强论点与弱论点,(2)基于信念和人口统计特征预测立场,以及(3)根据个体特征判断论点对其吸引力。我们表明,LLMs在这些任务上的表现与人类相当,并且结合不同LLMs的预测可带来显著的性能提升,甚至超越人类表现。本文发布的数据和代码为持续评估和监测LLMs快速发展的能力及其潜在影响这一关键工作做出了贡献。