Media bias detection poses a complex, multifaceted problem traditionally tackled using single-task models and small in-domain datasets, consequently lacking generalizability. To address this, we introduce MAGPIE, the first large-scale multi-task pre-training approach explicitly tailored for media bias detection. To enable pre-training at scale, we present Large Bias Mixture (LBM), a compilation of 59 bias-related tasks. MAGPIE outperforms previous approaches in media bias detection on the Bias Annotation By Experts (BABE) dataset, with a relative improvement of 3.3% F1-score. MAGPIE also performs better than previous models on 5 out of 8 tasks in the Media Bias Identification Benchmark (MBIB). Using a RoBERTa encoder, MAGPIE needs only 15% of finetuning steps compared to single-task approaches. Our evaluation shows, for instance, that tasks like sentiment and emotionality boost all learning, all tasks enhance fake news detection, and scaling tasks leads to the best results. MAGPIE confirms that MTL is a promising approach for addressing media bias detection, enhancing the accuracy and efficiency of existing models. Furthermore, LBM is the first available resource collection focused on media bias MTL.
翻译:媒体偏见检测是一个复杂且多层面的问题,传统上采用单任务模型和小规模领域内数据集处理,导致泛化能力不足。为解决这一问题,我们提出了MAGPIE——首个专门针对媒体偏见检测的大规模多任务预训练方法。为实现规模化预训练,我们引入了大型偏见混合数据集LBM(Large Bias Mixture),该数据集整合了59个与偏见相关的任务。在BABE数据集上,MAGPIE在媒体偏见检测任务中实现了3.3%的F1分数相对提升,优于以往方法。在媒体偏见识别基准测试MBIB的8项任务中,MAGPIE在其中5项上表现更佳。采用RoBERTa编码器时,MAGPIE仅需单任务方法15%的微调步骤。评估结果显示,情感和情绪性等任务能促进所有学习过程,所有任务共同提升假新闻检测效果,而任务规模扩展则能取得最优结果。MAGPIE证实了多任务学习是解决媒体偏见检测的有效途径,可提升现有模型的准确性与效率。此外,LBM是首个专注于媒体偏见多任务学习的公开资源集合。