Concerns persist regarding the capacity of Large Language Models (LLMs) to sway political views. Although prior research has claimed that LLMs are not more persuasive than standard political campaign practices, the recent rise of frontier models warrants further study. In two survey experiments (N=19,145) across bipartisan issues and stances, we evaluate seven state-of-the-art LLMs developed by Anthropic, OpenAI, Google, and xAI. We find that LLMs outperform standard campaign advertisements, with heterogeneity in performance across models. Specifically, Claude models exhibit the highest persuasiveness, while Grok exhibits the lowest. The results are robust across issues and stances. Moreover, in contrast to the findings in Hackenburg et al. (2025b) and Lin et al. (2025) that information-based prompts boost persuasiveness, we find that the effectiveness of information-based prompts is model-dependent: they increase the persuasiveness of Claude and Grok while substantially reducing that of GPT. We introduce a data-driven and strategy-agnostic LLM-assisted conversation analysis approach to identify and assess underlying persuasive strategies. Our work benchmarks the persuasive risks of frontier models and provides a framework for cross-model comparative risk assessment.
翻译:关于大型语言模型(LLMs)影响政治观点的能力,学界持续存在担忧。尽管先前研究声称LLMs的说服力并未超越标准政治竞选实践,但前沿模型的近期发展值得进一步探究。我们在涉及两党议题与立场的两项调查实验(N=19,145)中,评估了由Anthropic、OpenAI、Google及xAI开发的七种前沿LLMs。研究发现,LLMs的表现优于标准竞选广告,且不同模型间存在性能异质性。具体而言,Claude模型展现出最高的说服力,而Grok模型的说服力最低。该结果在不同议题与立场间保持稳健。此外,与Hackenburg等人(2025b)及Lin等人(2025)关于信息型提示增强说服力的发现相反,我们发现信息型提示的有效性具有模型依赖性:它们能提升Claude和Grok的说服力,却显著降低GPT的说服力。我们提出一种数据驱动且策略无关的LLM辅助对话分析方法,用以识别和评估潜在的说服策略。本研究为前沿模型的说服风险建立了基准,并为跨模型比较性风险评估提供了框架。