Large Language Models (LLMs) increasingly shape global discourse, making fairness and ideological neutrality essential for responsible AI deployment. Despite growing attention to political bias in LLMs, prior work largely focuses on high-resource, Western languages or narrow multilingual settings, leaving cross-lingual consistency and safe post-hoc mitigation underexplored. To address this gap, we present a large-scale multilingual evaluation of political bias spanning 50 countries and 33 languages. We introduce a complementary post-hoc mitigation framework, Cross-Lingual Alignment Steering (CLAS), designed to augment existing steering methods by aligning ideological representations across languages and dynamically regulating intervention strength. This method aligns latent ideological representations induced by political prompts into a shared ideological subspace, ensuring cross lingual consistency, with the adaptive mechanism prevents over correction and preserves coherence. Experiments demonstrate substantial bias reduction along both economic and social axes with minimal degradation in response quality. The proposed framework establishes a scalable and interpretable paradigm for fairness-aware multilingual LLM governance, balancing ideological neutrality with linguistic and cultural diversity.
翻译:大型语言模型日益塑造全球话语体系,使得公平性与意识形态中立性成为负责任人工智能部署的关键要素。尽管针对LLMs政治偏见的研究日益增多,现有工作主要集中于高资源的西方语言或有限的多语言场景,对跨语言一致性及安全的事后缓解机制探索不足。为填补这一空白,我们开展了涵盖50个国家、33种语言的大规模多语言政治偏见评估。我们提出了一种互补性事后缓解框架——跨语言对齐调控,该框架通过跨语言意识形态表征对齐与动态干预强度调节,增强现有调控方法的效能。该方法将政治提示诱导的潜在意识形态表征对齐至共享意识形态子空间,确保跨语言一致性;其自适应机制可防止过度矫正并保持响应连贯性。实验表明,该方法在经济与社会维度上均实现显著偏见削减,且响应质量退化最小。所提框架为公平感知的多语言LLM治理建立了可扩展且可解释的范式,在意识形态中立性与语言文化多样性之间实现了平衡。