It's the humans, not the data: Geopolitical bias in LLMs originates in post-training, amplified by the language of the prompt

It has generally been assumed that geopolitical bias in language models originates from the training data used during the pre-training phase. We tested seven open-weight LLM pairs consisting of the base model (pre-training only) and the chat model (pre-training and post-training) from seven labs on a paired-scenario forced-choice probe over 28 country pairs in English, French, and Chinese, and found that geopolitical bias originates in post-training rather than in pre-training. Across seven AI labs, six showed shifts in the direction associated with the country or region of the model developer after post-training. This shift is strongest in Alibaba's Qwen 2.5: while the base is neutral on China-favourability (-0.15 log-odds, p=0.15), the post-trained chat variant is at +2.91 (p<10^-4), an 18x shift in odds. We also observe shifts in biases toward other countries across all models. Additionally, the magnitude of this shift depends on the language used to prompt the model: the French-made Mistral becomes pro-France only under French prompting (FR-EN shift +1.91, p<10^-4). These findings suggest that geopolitical preferences in language models are not simply inherited from large-scale internet data but are actively shaped during post-training, highlighting the need for greater transparency, auditing, and oversight of alignment processes that influence how models represent nations, cultures, and political perspectives.

翻译：人们通常认为，语言模型中的地缘政治偏见源于预训练阶段所使用的训练数据。我们对来自七家实验室的七组开源权重LLM配对（仅含预训练的基础模型与包含预训练和后训练的对话模型），针对英语、法语和中文的28对国家组合，采用配对场景强制选择探针进行了测试，发现地缘政治偏见产生于后训练阶段而非预训练阶段。在七家AI实验室中，有六家的模型在后训练后呈现出与模型开发者所在国家或地区方向一致的偏移。这一偏移在阿里巴巴Qwen 2.5中最为显著：基础模型对中国倾向保持中性（对数几率比-0.15，p=0.15），而后训练的对话变体却达到+2.91（p<10^-4），几率比偏移达18倍。我们也在所有模型中观察到对其他国家的偏见偏移。此外，此偏移幅度取决于用于提示模型的语言：法国制造的Mistral仅在法语提示下表现出亲法倾向（法-英偏移+1.91，p<10^-4）。这些发现表明，语言模型中的地缘政治偏好并非简单继承自大规模互联网数据，而是在后训练阶段被主动塑造，凸显了需对影响模型呈现国家、文化和政治视角方式的对齐过程加强透明度、审计与监督。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

大型语言模型中隐性与显性偏见的综合研究

专知会员服务

17+阅读 · 2025年11月25日

大型语言模型的规模效应局限

专知会员服务

14+阅读 · 2025年11月18日

《战争迷雾中的红线与灰色地带：基于大语言模型的军事决策风险、区域偏见基准测试》2025最新54页报告

专知会员服务

36+阅读 · 2025年10月10日

人工智能军事决策支持系统中的算法偏见问题

专知会员服务

34+阅读 · 2024年9月11日