PoliCon: Evaluating LLMs on Achieving Diverse Political Consensus Objectives

Achieving political consensus is crucial yet challenging for the effective functioning of social governance. However, although frontier AI systems represented by large language models (LLMs) have developed rapidly in recent years, their capabilities in this scope are still understudied. In this paper, we introduce PoliCon, a novel benchmark constructed from 2,225 high-quality deliberation records of the European Parliament over 13 years, ranging from 2009 to 2022, to evaluate the ability of LLMs to draft consensus resolutions based on divergent party positions under varying collective decision-making contexts and political requirements. Specifically, PoliCon incorporates four factors to build each task environment for finding different political consensus: specific political issues, political goals, participating parties, and power structures based on seat distribution. We also developed an evaluation framework based on social choice theory for PoliCon, which simulates the real voting outcomes of different political parties to assess whether LLM-generated resolutions meet the requirements of the predetermined political consensus. Our experimental results demonstrate that even state-of-the-art models remain undersatisfied with complex tasks like passing resolutions by a two-thirds majority and addressing security issues, while uncovering their inherent partisan biases and revealing some behaviors LLMs show to achieve the consensus, such as prioritizing the stance of the dominant party instead of uniting smaller parties, which highlights PoliCon's promise as an effective platform for studying LLMs' ability to promote political consensus. The code and dataset are released at https://zowiezhang.github.io/projects/PoliCon.

翻译：实现政治共识对于社会治理的有效运作至关重要，但也极具挑战性。然而，尽管以大型语言模型（LLMs）为代表的前沿人工智能系统近年来发展迅速，但其在此领域的能力仍未得到充分研究。本文提出了PoliCon，这是一个新颖的基准测试，基于欧洲议会2009年至2022年13年间的2,225条高质量审议记录构建而成，旨在评估LLMs在不同集体决策环境和政治要求下，基于分歧的政党立场起草共识决议的能力。具体而言，PoliCon整合了四个因素来构建每个任务环境，以寻求不同的政治共识：具体政治议题、政治目标、参与政党以及基于席位分配的权力结构。我们还为PoliCon开发了一个基于社会选择理论的评估框架，该框架模拟了不同政党的真实投票结果，以评估LLM生成的决议是否满足预定政治共识的要求。我们的实验结果表明，即使是目前最先进的模型，在面对诸如以三分之二多数通过决议和处理安全议题等复杂任务时，其表现仍不尽如人意，同时揭示了其固有的党派偏见，以及LLMs为达成共识而表现出的一些行为，例如优先考虑主导政党的立场而非联合较小政党。这凸显了PoliCon作为一个有效平台，在研究LLMs促进政治共识能力方面的潜力。代码和数据集发布于 https://zowiezhang.github.io/projects/PoliCon。