Neutrality is difficult to achieve and, in politics, subjective. Traditional media typically adopt an editorial line that can be used by their potential readers as an indicator of the media bias. Several platforms currently rate news outlets according to their political bias. The editorial line and the ratings help readers in gathering a balanced view of news. But in the advent of instruction-following language models, tasks such as writing a newspaper article can be delegated to computers. Without imposing a biased persona, where would an AI-based news outlet lie within the bias ratings? In this work, we use the ratings of authentic news outlets to create a multilingual corpus of news with coarse stance annotations (Left and Right) along with automatically extracted topic annotations. We show that classifiers trained on this data are able to identify the editorial line of most unseen newspapers in English, German, Spanish and Catalan. We then apply the classifiers to 101 newspaper-like articles written by ChatGPT and Bard in the 4 languages at different time periods. We observe that, similarly to traditional newspapers, ChatGPT editorial line evolves with time and, being a data-driven system, the stance of the generated articles differs among languages.
翻译:中立性难以实现,在政治领域尤为主观。传统媒体通常持有某种编辑方针,潜在读者可据此判断媒体倾向。当前多个平台根据新闻媒体的政治偏见对其评级,编辑方针与评级有助于读者获取平衡的新闻视角。但指令遵循型语言模型的出现,使得撰写报纸文章等任务可交由计算机完成。若不施加倾向性人格,基于AI的新闻媒体在偏见评级中会处于何种位置?本研究利用真实新闻媒体的评级,构建了一个包含粗粒度立场标注(左派与右派)及自动提取主题标注的多语言新闻语料库。实验表明,基于该数据训练的模型能识别英语、德语、西班牙语和加泰罗尼亚语中大多数未见报纸的编辑方针。随后,我们将分类器应用于ChatGPT和Bard在不同时期用四种语言生成的101篇类报纸文章。我们发现,与传统报纸相似,ChatGPT的编辑方针会随时间演变;作为数据驱动系统,其生成文章的政治立场在不同语言间存在差异。