Uncovering Political Bias in Large Language Models using Parliamentary Voting Records

As large language models (LLMs) become deeply embedded in digital platforms and decision-making systems, concerns about their political biases have grown. While substantial work has examined social biases such as gender and race, systematic studies of political bias remain limited, despite their direct societal impact. This paper introduces a general methodology for constructing political bias benchmarks by aligning model-generated voting predictions with verified parliamentary voting records. We instantiate this methodology in three national case studies: PoliBiasNL (2,701 Dutch parliamentary motions and votes from 15 political parties), PoliBiasNO (10,584 motions and votes from 9 Norwegian parties), and PoliBiasES (2,480 motions and votes from 10 Spanish parties). Across these benchmarks, we assess ideological tendencies and political entity bias in LLM behavior. As part of our evaluation framework, we also propose a method to visualize the ideology of LLMs and political parties in a shared two-dimensional CHES (Chapel Hill Expert Survey) space by linking their voting-based positions to the CHES dimensions, enabling direct and interpretable comparisons between models and real-world political actors. Our experiments reveal fine-grained ideological distinctions: state-of-the-art LLMs consistently display left-leaning or centrist tendencies, alongside clear negative biases toward right-conservative parties. These findings highlight the value of transparent, cross-national evaluation grounded in real parliamentary behavior for understanding and auditing political bias in modern LLMs.

翻译：随着大语言模型（LLMs）深度嵌入数字平台和决策系统，对其政治偏见的担忧日益增长。尽管已有大量研究关注性别、种族等社会偏见，但政治偏见的系统性研究仍然有限，尽管其具有直接的社会影响。本文提出了一种通过将模型生成的投票预测与经过验证的议会投票记录对齐来构建政治偏见基准的通用方法。我们在三个国家案例中实例化了该方法：PoliBiasNL（荷兰议会2,701项动议及15个政党的投票记录）、PoliBiasNO（挪威议会10,584项动议及9个政党的投票记录）和PoliBiasES（西班牙议会2,480项动议及10个政党的投票记录）。基于这些基准，我们评估了LLMs行为中的意识形态倾向和政治实体偏见。作为评估框架的一部分，我们提出了一种可视化方法，通过将基于投票的立场与CHES（教堂山专家调查）维度关联，在共享的二维CHES空间中展示LLMs和政党的意识形态，从而实现模型与现实政治行为者之间直接且可解释的比较。我们的实验揭示了细粒度的意识形态差异：最先进的LLMs持续表现出左倾或中间派倾向，同时对右翼保守政党存在明显的负面偏见。这些发现凸显了基于真实议会行为的透明跨国评估对于理解和审计现代LLMs政治偏见的价值。