Large language models (LLMs) are known to generate politically biased text. Yet, it remains unclear how such biases arise, making it difficult to design effective mitigation strategies. We hypothesize that these biases are rooted in the composition of training data. Taking a data-centric perspective, we formulate research questions on (1) political leaning present in data, (2) data imbalance, (3) cross-dataset similarity, and (4) data-model alignment. We then examine how exposure to political content relates to models' stances on policy issues. We analyze the political content of pre- and post-training datasets of open-source LLMs, combining large-scale sampling, political-leaning classification, and stance detection. We find that training data is systematically skewed toward left-leaning content, with pre-training corpora containing substantially more politically engaged material than post-training data. We further observe a strong correlation between political stances in training data and model behavior, and show that pre-training datasets exhibit similar political distributions despite different curation strategies. In addition, we find that political biases are already present in base models and persist across post-training stages. These findings highlight the central role of data composition in shaping model behavior and motivate the need for greater data transparency.
翻译:大型语言模型(LLM)被广泛发现会生成带有政治偏见的文本。然而,这类偏见如何产生仍不明确,这使得设计有效的缓解策略变得困难。我们假设这些偏见根植于训练数据的构成。采取以数据为中心的视角,我们围绕以下四个方面提出研究问题:(1)数据中存在的政治倾向,(2)数据不平衡性,(3)跨数据集相似性,以及(4)数据与模型的对齐性。随后,我们探究政治内容暴露程度与模型在政策问题上的立场之间的关系。我们通过结合大规模抽样、政治倾向分类和立场检测,分析了开源LLM预训练与后训练数据集中的政治内容。研究发现:训练数据系统性地偏向左倾内容,且预训练语料库包含的政治相关材料显著多于后训练数据。我们进一步观察到训练数据中的政治立场与模型行为之间存在强相关性,并表明尽管采用不同的数据策展策略,预训练数据集仍呈现相似的政治分布。此外,我们发现政治偏见已存在于基础模型中,并在后训练阶段持续存在。这些发现凸显了数据构成在塑造模型行为中的核心作用,并亟需提升数据透明度。