Political bias in large language models (LLMs) is increasingly significant, but difficult to measure reproducibly across political and linguistic contexts. We introduce Polar, a 4,026-instance multiple-choice benchmark that measures political bias through option-level likelihoods rather than prompt-based generation. Polar covers two ideological axes and eight issue categories derived from the Manifesto Project, and evaluates models in parallel across U.S. and South Korean political contexts. Across 38 LLMs, measured bias varies systematically with political context, issue category, model group, and presentation language. All models lean left-progressive on U.S. political content, but show more centered and mixed patterns on South Korean content. Translation experiments further show that presentation language alone can shift measured bias. These findings highlight the need for multilingual and cross-contextual evaluation of political bias in LLMs.
翻译:大语言模型(LLMs)中的政治偏见问题日益凸显,但在不同政治与语言背景下难以实现可重复测量。我们提出Polar,一个包含4026个实例的多选题基准,通过选项级似然度而非基于提示的生成方式衡量政治偏见。该基准涵盖源自宣言项目的两个意识形态轴线和八个议题类别,并平行评估模型在美国与韩国政治背景下的表现。对38个LLMs的测试表明,测量到的偏见随政治背景、议题类别、模型系列及呈现语言呈现系统性变化。所有模型在美国政治内容上均偏向左翼进步倾向,但在韩国内容上表现出更中立且混合的模式。进一步的语言翻译实验显示,仅改变呈现语言即可使测量偏见发生偏移。这些发现凸显了对LLMs政治偏见进行多语言与跨语境评估的必要性。