Recent research shows that visualizing linguistic bias mitigates its negative effects. However, reliable automatic detection methods to generate such visualizations require costly, knowledge-intensive training data. To facilitate data collection for media bias datasets, we present News Ninja, a game employing data-collecting game mechanics to generate a crowdsourced dataset. Before annotating sentences, players are educated on media bias via a tutorial. Our findings show that datasets gathered with crowdsourced workers trained on News Ninja can reach significantly higher inter-annotator agreements than expert and crowdsourced datasets with similar data quality. As News Ninja encourages continuous play, it allows datasets to adapt to the reception and contextualization of news over time, presenting a promising strategy to reduce data collection expenses, educate players, and promote long-term bias mitigation.
翻译:近期研究表明,可视化呈现语言偏见可减轻其负面影响。然而,生成此类可视化所需的可靠自动检测方法依赖于成本高昂且知识密集型的训练数据。为促进媒体偏见数据集的数据收集,我们提出"新闻忍者"——一款运用数据收集游戏机制生成众包数据集的游戏。玩家在标注语句前需通过教程学习媒体偏见相关知识。研究结果表明,通过"新闻忍者"训练的众包工作者收集的数据集,在数据质量相近的情况下,其标注者间一致性显著高于专家数据集及其他众包数据集。由于"新闻忍者"鼓励持续性参与,该平台使得数据集能够适应新闻接收与语境化的动态演变,为降低数据收集成本、教育参与者并推动长期偏见缓解提供了前景广阔的策略。