News recommender systems play an increasingly influential role in shaping information access within democratic societies. However, tailoring recommendations to users' specific interests can result in the divergence of information streams. Fragmented access to information poses challenges to the integrity of the public sphere, thereby influencing democracy and public discourse. The Fragmentation metric quantifies the degree of fragmentation of information streams in news recommendations. Accurate measurement of this metric requires the application of Natural Language Processing (NLP) to identify distinct news events, stories, or timelines. This paper presents an extensive investigation of various approaches for quantifying Fragmentation in news recommendations. These approaches are evaluated both intrinsically, by measuring performance on news story clustering, and extrinsically, by assessing the Fragmentation scores of different simulated news recommender scenarios. Our findings demonstrate that agglomerative hierarchical clustering coupled with SentenceBERT text representation is substantially better at detecting Fragmentation than earlier implementations. Additionally, the analysis of simulated scenarios yields valuable insights and recommendations for stakeholders concerning the measurement and interpretation of Fragmentation.
翻译:新闻推荐系统在塑造民主社会中的信息获取方面发挥着日益重要的作用。然而,根据用户的特定兴趣定制推荐可能导致信息流的碎片化。信息获取的碎片化对公共领域的完整性构成挑战,从而影响民主与公共讨论。碎片化指标量化了新闻推荐中信息流的碎片化程度。准确测量该指标需要应用自然语言处理技术来识别不同的新闻事件、故事或时间线。本文深入研究了多种用于量化新闻推荐中碎片化的方法。这些方法通过内在评估(在新闻故事聚类上的性能表现)和外在评估(评估不同模拟新闻推荐场景下的碎片化得分)进行了全面验证。研究结果表明,结合SentenceBERT文本表示的凝聚层次聚类在检测碎片化方面显著优于早期实现。此外,对模拟场景的分析为利益相关者提供了关于碎片化测量与解读的宝贵见解和实用建议。