This paper examines the influence of recommender systems on local music representation, discussing prior findings from an empirical study on the LFM-2b public dataset. This prior study argued that different recommender systems exhibit algorithmic biases shifting music consumption either towards or against local content. However, LFM-2b users do not reflect the diverse audience of music streaming services. To assess the robustness of this study's conclusions, we conduct a comparative analysis using proprietary listening data from a global music streaming service, which we publicly release alongside this paper. We observe significant differences in local music consumption patterns between our dataset and LFM-2b, suggesting that caution should be exercised when drawing conclusions on local music based solely on LFM-2b. Moreover, we show that the algorithmic biases exhibited in the original work vary in our dataset, and that several unexplored model parameters can significantly influence these biases and affect the study's conclusion on both datasets. Finally, we discuss the complexity of accurately labeling local music, emphasizing the risk of misleading conclusions due to unreliable, biased, or incomplete labels. To encourage further research and ensure reproducibility, we have publicly shared our dataset and code.
翻译:本文探讨了推荐系统对本土音乐呈现的影响,并讨论了先前基于LFM-2b公开数据集实证研究的结果。该先前研究认为,不同的推荐系统表现出算法偏见,可能使音乐消费趋向或偏离本土内容。然而,LFM-2b用户并不能代表音乐流媒体服务的多样化受众。为评估该研究结论的稳健性,我们使用来自全球音乐流媒体服务的专有收听数据进行了比较分析,该数据集已随本文公开发布。我们观察到,我们的数据集与LFM-2b在本地音乐消费模式上存在显著差异,这表明仅基于LFM-2b得出关于本土音乐的结论需谨慎。此外,我们发现原始工作中展现的算法偏见在我们的数据集中存在差异,并且多个未探索的模型参数会显著影响这些偏见,进而改变研究在两个数据集上的结论。最后,我们讨论了准确标注本土音乐的复杂性,强调了因标注不可靠、有偏见或不完整而导致误导性结论的风险。为促进进一步研究并确保可复现性,我们已公开分享数据集和代码。