NeMig -- A Bilingual News Collection and Knowledge Graph about Migration

News recommendation plays a critical role in shaping the public's worldviews through the way in which it filters and disseminates information about different topics. Given the crucial impact that media plays in opinion formation, especially for sensitive topics, understanding the effects of personalized recommendation beyond accuracy has become essential in today's digital society. In this work, we present NeMig, a bilingual news collection on the topic of migration, and corresponding rich user data. In comparison to existing news recommendation datasets, which comprise a large variety of monolingual news, NeMig covers articles on a single controversial topic, published in both Germany and the US. We annotate the sentiment polarization of the articles and the political leanings of the media outlets, in addition to extracting subtopics and named entities disambiguated through Wikidata. These features can be used to analyze the effects of algorithmic news curation beyond accuracy-based performance, such as recommender biases and the creation of filter bubbles. We construct domain-specific knowledge graphs from the news text and metadata, thus encoding knowledge-level connections between articles. Importantly, while existing datasets include only click behavior, we collect user socio-demographic and political information in addition to explicit click feedback. We demonstrate the utility of NeMig through experiments on the tasks of news recommenders benchmarking, analysis of biases in recommenders, and news trends analysis. NeMig aims to provide a useful resource for the news recommendation community and to foster interdisciplinary research into the multidimensional effects of algorithmic news curation.

翻译：新闻推荐在通过筛选和传播不同主题信息塑造公众世界观方面发挥着关键作用。鉴于媒体在观点形成中的重大影响，尤其是针对敏感话题，理解个性化推荐在准确性之外的效应已成为当今数字社会的重要课题。本文提出NeMig——一个关于移民话题的双语新闻集及相应的丰富用户数据。与现有包含大量单语新闻的推荐数据集相比，NeMig聚焦于单个争议性话题，涵盖发表于德国和美国的文章。我们标注了文章的情感极化倾向及媒体渠道的政治立场，并提取子话题及通过维基数据消歧的命名实体。这些特征可用于分析算法新闻策展在准确性评估之外的效应，例如推荐偏见与过滤气泡的产生。我们从新闻文本与元数据构建领域特定知识图谱，从而编码文章间的知识层面关联。重要的是，现有数据集仅包含点击行为，而我们在明确的点击反馈之外，还收集了用户的社会人口学与政治信息。我们通过新闻推荐基准测试、推荐器偏见分析及新闻趋势分析等实验展示了NeMig的实用性。NeMig旨在为新闻推荐社区提供有用资源，并促进对算法新闻策展多维效应的跨学科研究。