Large Language Models (LLMs) increasingly shape global discourse, making fairness and ideological neutrality essential for responsible AI deployment. Despite growing attention to political bias in LLMs, prior work largely focuses on high-resource, Western languages or narrow multilingual settings, leaving cross-lingual consistency and safe post-hoc mitigation underexplored. To address this gap, we present a large-scale multilingual evaluation of political bias spanning 50 countries and 33 languages. We introduce a complementary post-hoc mitigation framework, Cross-Lingual Alignment Steering (CLAS), designed to augment existing steering methods by aligning ideological representations across languages and dynamically regulating intervention strength. This method aligns latent ideological representations induced by political prompts into a shared ideological subspace, ensuring cross lingual consistency, with the adaptive mechanism prevents over correction and preserves coherence. Experiments demonstrate substantial bias reduction along both economic and social axes with minimal degradation in response quality. The proposed framework establishes a scalable and interpretable paradigm for fairness-aware multilingual LLM governance, balancing ideological neutrality with linguistic and cultural diversity.
翻译:大型语言模型(LLMs)日益影响全球话语,因此公平性和意识形态中立性对于负责任的人工智能部署至关重要。尽管对LLMs中政治偏见的关注日益增长,但先前的研究主要集中于高资源的西方语言或有限的多语言环境,对跨语言一致性和安全的后续缓解措施探索不足。为弥补这一空白,我们开展了一项涵盖50个国家和33种语言的大规模多语言政治偏见评估。我们提出了一种互补的后续缓解框架——跨语言对齐调控(Cross-Lingual Alignment Steering, CLAS),旨在通过跨语言对齐意识形态表征并动态调节干预强度来增强现有的调控方法。该方法将由政治提示诱导的潜在意识形态表征对齐到一个共享的意识形态子空间中,确保跨语言一致性,其自适应机制可防止过度校正并保持连贯性。实验表明,该方法在经济和社会两个轴向上均显著减少了偏见,同时响应质量下降最小。所提出的框架为公平感知的多语言LLM治理建立了一个可扩展且可解释的范式,在意识形态中立性与语言文化多样性之间取得了平衡。