Large Language Model (LLM) based summarization and text generation are increasingly used for producing and rewriting text, raising concerns about political framing in journalism where subtle wording choices can shape interpretation. Across nine state-of-the-art LLMs, we study political framing by testing whether LLMs' classification-based bias signals align with framing behavior in their generated summaries. We first compare few-shot ideology predictions against LEFT/CENTER/RIGHT labels. We then generate "steered" summaries under FAITHFUL, CENTRIST, LEFT, and RIGHT prompts, and score all outputs using a single fixed ideology evaluator. We find pervasive ideological center-collapse in both article-level ratings and generated text, indicating a systematic tendency toward centrist framing. Among evaluated models, Grok 4 is by far the most ideologically expressive generator, while Claude Sonnet 4.5 and Llama 3.1 achieve the strongest bias-rating performance among commercial and open-weight models, respectively.
翻译:基于大语言模型(LLM)的摘要生成与文本改写技术正日益广泛地应用于新闻文本的生产与重构,这引发了业界对其在新闻报道中可能存在的政治框架效应的担忧——微妙的措辞选择足以影响文本的解读取向。本研究针对九种前沿大语言模型,通过检验模型在分类任务中表现出的偏见信号是否与其生成摘要时的框架行为相一致,系统探究了其政治框架倾向。我们首先通过少样本学习方式,将模型的意识形态预测结果与“左倾/中立/右倾”标签进行对比。随后,我们在“忠实原文”“中立立场”“左倾框架”“右倾框架”四种提示条件下生成“定向引导”的新闻摘要,并采用统一的固定意识形态评估器对所有输出文本进行评分。研究发现,无论是文章层面的评级还是生成文本,均普遍存在意识形态向中心坍缩的现象,这表明模型存在系统性倾向于中立框架的态势。在评估的模型中,Grok 4 是目前意识形态表达最为鲜明的文本生成器;而在商业模型与开源权重模型中,Claude Sonnet 4.5 与 Llama 3.1 分别在偏见评级任务中表现最为突出。