Political Neutrality as Balanced Approval: A Large-Scale Human Evaluation of AI Responses

As AI systems increasingly shape political views, defining and evaluating AI political neutrality is an urgent problem. Here, we propose a new definition of AI political neutrality and design a large-scale user study to test it, releasing a new dataset PARETO with 7,434 participants and 208,152 evaluations of AI responses. Our definition follows a simple principle grounded in political theory: when asked about a controversial issue, an AI model should generate responses that maximize approval across groups with opposing viewpoints, while balancing approval between groups. This definition allows empirical testing of whether an AI response is "neutral" and generalizes to any political context without pre-supposing a single left-right axis of division. We construct a benchmark of controversial U.S. issues, with prompts sourced from politically charged questions on Reddit and responses from frontier AI models, and recruit human participants to rate AI responses. Across all 20 issues, we find that it is possible for AI responses to achieve high rates of approval on both sides, even as those sides disagree strongly with each other on the substance of the issues. We also find that default responses lean liberal for GPT, Gemini, Claude, and Llama, but not Grok, and that user prompts with political charges are harder to respond to than neutral prompts. This work introduces a rigorous definition and benchmark of AI political neutrality, and a dataset to measure progress toward it.

翻译：随着人工智能系统日益影响政治观点，界定和评估AI政治中立性已成为一项紧迫课题。本文提出了一种AI政治中立的新定义，并设计了一项大规模用户研究对其进行检验，同时发布了包含7,434名参与者和208,152次AI回应评估的新数据集PARETO。我们的定义遵循政治理论中的基本原则：当被问及存在争议的问题时，AI模型生成的回应应在对立观点群体间最大化认可度，同时保持群体间认可度的平衡。该定义允许对AI回应是否“中立”进行实证检验，且无需预设单一左-右划分轴即可适用于任何政治语境。我们构建了美国争议性议题基准测试集——提示词源自Reddit上具有政治倾向的问题，回应来自前沿AI模型——并招募人类参与者对AI回应进行评分。在全部20个议题中，我们发现AI回应有可能在冲突双方均获得高认可率，即便双方在议题实质内容上存在严重分歧。同时，GPT、Gemini、Claude和Llama（但不包括Grok）的默认回应倾向自由主义立场，而包含政治倾向的用户提示词比中性提示词更难应对。本研究提出了AI政治中立性的严谨定义与基准测试，并提供了用于衡量其进步程度的公开数据集。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

人工智能解释公平性：统一框架、公理与负责任AI的未来方向

专知会员服务

13+阅读 · 5月12日

《人工智能治理实施的挑战与应对策略：系统性文献综述》最新97页

专知会员服务

25+阅读 · 2025年7月24日

《防务领域人工智能可信赖性：为防务开发负责任、符合伦理且可信赖的AI系统》欧洲防务局2025最新107页

专知会员服务

23+阅读 · 2025年5月14日

可解释人工智能（XAI）：从内在可解释性到大语言模型

专知会员服务

34+阅读 · 2025年1月20日