In this paper, we introduce the MediaSpin dataset aiming to help in the development of models that can detect different forms of media bias present in news headlines, developed through human-supervised and -validated Large Language Model (LLM) labeling of media bias. This corpus comprises 78,910 pairs of news headlines and annotations with explanations of the 13 distinct types of media bias categories assigned. We demonstrate the usefulness of our dataset for automated bias detection in news edits.
翻译:本文介绍了MediaSpin数据集,旨在助力开发能够检测新闻标题中不同形式媒体偏见的模型。该数据集通过人工监督与验证的大型语言模型(LLM)标注媒体偏见构建而成,包含78,910条新闻标题-标注对,并附有13种不同媒体偏见类别的解释说明。我们验证了本数据集在新闻编辑自动化偏见检测中的实用价值。