Machine Learning models are being utilized extensively to drive recommender systems, which is a widely explored topic today. This is especially true of the music industry, where we are witnessing a surge in growth. Besides a large chunk of active users, these systems are fueled by massive amounts of data. These large-scale systems yield applications that aim to provide a better user experience and to keep customers actively engaged. In this paper, a distributed Machine Learning (ML) pipeline is delineated, which is capable of taking a subset of songs as input and producing a new subset of songs identified as being similar to the inputted subset. The publicly accessible Million Songs Dataset (MSD) enables researchers to develop and explore reasonably efficient systems for audio track analysis and recommendations, without having to access a commercialized music platform. The objective of the proposed application is to leverage an ML system trained to optimally recommend songs that a user might like.
翻译:机器学习模型正被广泛用于驱动推荐系统,这是当今一个备受探索的话题。在音乐行业中尤其如此,我们正目睹其激增式增长。除了大量活跃用户外,这些系统还依赖于海量数据驱动。这些大规模系统生成的应用程序旨在提供更好的用户体验并保持用户的积极参与。本文描述了一种分布式机器学习(ML)流水线,该流水线能够以歌曲子集为输入,并生成被识别为与输入子集相似的新歌曲子集。公开可访问的百万歌曲数据集(MSD)使研究人员能够在无需访问商业化音乐平台的情况下,开发并探索用于音频曲目分析和推荐的合理高效系统。所提出应用的目标是利用一个经过训练以优化推荐用户可能喜欢的歌曲的ML系统。