I report here a comprehensive analysis about the political preferences embedded in Large Language Models (LLMs). Namely, I administer 11 political orientation tests, designed to identify the political preferences of the test taker, to 24 state-of-the-art conversational LLMs, both closed and open source. When probed with questions/statements with political connotations, most conversational LLMs tend to generate responses that are diagnosed by most political test instruments as manifesting preferences for left-of-center viewpoints. This does not appear to be the case for five additional base (i.e. foundation) models upon which LLMs optimized for conversation with humans are built. However, the weak performance of the base models at coherently answering the tests' questions makes this subset of results inconclusive. Finally, I demonstrate that LLMs can be steered towards specific locations in the political spectrum through Supervised Fine-Tuning (SFT) with only modest amounts of politically aligned data, suggesting SFT's potential to embed political orientation in LLMs. With LLMs beginning to partially displace traditional information sources like search engines and Wikipedia, the societal implications of political biases embedded in LLMs are substantial.
翻译:本文对大型语言模型(LLMs)中蕴含的政治倾向进行了全面分析。具体而言,我对24个最先进的对话型LLMs(包括闭源和开源模型)实施了11项旨在识别受试者政治倾向的政治取向测试。当被问及具有政治内涵的问题/陈述时,大多数对话型LLMs生成的回应被多数政治测试工具诊断为表现出左翼倾向。而对于构建对话优化型LLMs的五个额外基础(即预训练)模型,这一现象并不明显。然而,基础模型在连贯回答测试问题方面的薄弱表现,使得该子集的结果尚不具备决定性。最后,我通过监督微调(SFT)证明,仅需适量政治倾向对齐数据即可引导LLMs指向政治光谱中的特定位置,这揭示了SFT在LLMs中嵌入政治取向的潜力。随着LLMs开始部分取代搜索引擎和维基百科等传统信息源,其内嵌政治偏见的社会影响将日益凸显。