As AI systems increasingly shape political views, defining and evaluating AI political neutrality is an urgent problem. Here, we propose a new definition of AI political neutrality and design a large-scale user study to test it, releasing a new dataset PARETO with 7,434 participants and 208,152 evaluations of AI responses. Our definition follows a simple principle grounded in political theory: when asked about a controversial issue, an AI model should generate responses that maximize approval across groups with opposing viewpoints, while balancing approval between groups. This definition allows empirical testing of whether an AI response is "neutral" and generalizes to any political context without pre-supposing a single left-right axis of division. We construct a benchmark of controversial U.S. issues, with prompts sourced from politically charged questions on Reddit and responses from frontier AI models, and recruit human participants to rate AI responses. Across all 20 issues, we find that it is possible for AI responses to achieve high rates of approval on both sides, even as those sides disagree strongly with each other on the substance of the issues. We also find that default responses lean liberal for GPT, Gemini, Claude, and Llama, but not Grok, and that user prompts with political charges are harder to respond to than neutral prompts. This work introduces a rigorous definition and benchmark of AI political neutrality, and a dataset to measure progress toward it.
翻译:随着人工智能系统日益影响政治观点,界定和评估AI政治中立性已成为一项紧迫课题。本文提出了一种AI政治中立的新定义,并设计了一项大规模用户研究对其进行检验,同时发布了包含7,434名参与者和208,152次AI回应评估的新数据集PARETO。我们的定义遵循政治理论中的基本原则:当被问及存在争议的问题时,AI模型生成的回应应在对立观点群体间最大化认可度,同时保持群体间认可度的平衡。该定义允许对AI回应是否“中立”进行实证检验,且无需预设单一左-右划分轴即可适用于任何政治语境。我们构建了美国争议性议题基准测试集——提示词源自Reddit上具有政治倾向的问题,回应来自前沿AI模型——并招募人类参与者对AI回应进行评分。在全部20个议题中,我们发现AI回应有可能在冲突双方均获得高认可率,即便双方在议题实质内容上存在严重分歧。同时,GPT、Gemini、Claude和Llama(但不包括Grok)的默认回应倾向自由主义立场,而包含政治倾向的用户提示词比中性提示词更难应对。本研究提出了AI政治中立性的严谨定义与基准测试,并提供了用于衡量其进步程度的公开数据集。