Music recommendation systems have emerged as a vital component to enhance user experience and satisfaction for the music streaming services, which dominates music consumption. The key challenge in improving these recommender systems lies in comprehending the complexity of music data, specifically for the underpinning music genre classification. The limitations of manual genre classification have highlighted the need for a more advanced system, namely the Automatic Music Genre Classification (AMGC) system. While traditional machine learning techniques have shown potential in genre classification, they heavily rely on manually engineered features and feature selection, failing to capture the full complexity of music data. On the other hand, deep learning classification architectures like the traditional Convolutional Neural Networks (CNN) are effective in capturing the spatial hierarchies but struggle to capture the temporal dynamics inherent in music data. To address these challenges, this study proposes a novel approach using visual spectrograms as input, and propose a hybrid model that combines the strength of the Residual neural Network (ResNet) and the Gated Recurrent Unit (GRU). This model is designed to provide a more comprehensive analysis of music data, offering the potential to improve the music recommender systems through achieving a more comprehensive analysis of music data and hence potentially more accurate genre classification.
翻译:音乐推荐系统已成为提升用户对音乐流媒体服务体验与满意度的关键组成部分,而该服务在音乐消费领域占据主导地位。改进这些推荐系统的核心挑战在于理解音乐数据的复杂性,尤其是支撑性的音乐流派分类。人工流派分类的局限性凸显了对更先进系统(即自动音乐流派分类系统)的需求。传统机器学习技术在流派分类中虽展现出潜力,但严重依赖人工设计特征与特征选择,未能充分捕捉音乐数据的完整复杂性。另一方面,传统卷积神经网络等深度学习分类架构虽能有效捕获空间层次结构,却难以处理音乐数据固有的时序动态特征。为应对这些挑战,本研究提出一种创新方法——以可视化频谱图作为输入,并构建结合残差网络与门控循环单元的混合模型。该模型旨在对音乐数据进行更全面的分析,通过实现更深入的数据理解进而可能提升流派分类的准确性,最终为改进音乐推荐系统提供潜力。