News recommender systems play an increasingly influential role in shaping information access within democratic societies. However, tailoring recommendations to users' specific interests can result in the divergence of information streams. Fragmented access to information poses challenges to the integrity of the public sphere, thereby influencing democracy and public discourse. The Fragmentation metric quantifies the degree of fragmentation of information streams in news recommendations. Accurate measurement of this metric requires the application of Natural Language Processing (NLP) to identify distinct news events, stories, or timelines. This paper presents an extensive investigation of various approaches for quantifying Fragmentation in news recommendations. These approaches are evaluated both intrinsically, by measuring performance on news story clustering, and extrinsically, by assessing the Fragmentation scores of different simulated news recommender scenarios. Our findings demonstrate that agglomerative hierarchical clustering coupled with SentenceBERT text representation is substantially better at detecting Fragmentation than earlier implementations. Additionally, the analysis of simulated scenarios yields valuable insights and recommendations for stakeholders concerning the measurement and interpretation of Fragmentation.
翻译:新闻推荐系统在民主社会中日益关键地影响着信息获取方式。然而,将推荐内容定制化地适配用户特定兴趣可能导致信息流分化。信息获取碎片化危及公共领域完整性,进而影响民主进程与公共讨论。碎片化指标用于量化新闻推荐中信息流的分散程度,其精确测量需借助自然语言处理技术识别不同新闻事件、故事或时间线。本文系统研究了多种量化新闻推荐碎片化的方法,并从内在性能(基于新闻故事聚类效果评估)与外在表现(基于模拟新闻推荐场景的碎片化评分评估)两个维度进行验证。研究表明,结合SentenceBERT文本表示的凝聚层次聚类算法在碎片化检测中显著优于早期实现方案。此外,模拟场景分析为利益相关者提供了关于碎片化测量与解读的重要洞见及实践建议。