This study explores the application of recurrent neural networks to recognize emotions conveyed in music, aiming to enhance music recommendation systems and support therapeutic interventions by tailoring music to fit listeners' emotional states. We utilize Russell's Emotion Quadrant to categorize music into four distinct emotional regions and develop models capable of accurately predicting these categories. Our approach involves extracting a comprehensive set of audio features using Librosa and applying various recurrent neural network architectures, including standard RNNs, Bidirectional RNNs, and Long Short-Term Memory (LSTM) networks. Initial experiments are conducted using a dataset of 900 audio clips, labeled according to the emotional quadrants. We compare the performance of our neural network models against a set of baseline classifiers and analyze their effectiveness in capturing the temporal dynamics inherent in musical expression. The results indicate that simpler RNN architectures may perform comparably or even superiorly to more complex models, particularly in smaller datasets. We've also applied the following experiments on larger datasets: one is augmented based on our original dataset, and the other is from other sources. This research not only enhances our understanding of the emotional impact of music but also demonstrates the potential of neural networks in creating more personalized and emotionally resonant music recommendation and therapy systems.
翻译:本研究探索了循环神经网络在识别音乐所传达情感中的应用,旨在通过匹配听众情绪状态定制音乐,从而优化音乐推荐系统并支持治疗性干预。我们采用拉塞尔情感象限将音乐划分为四个不同的情感区域,并开发能够准确预测这些类别的模型。方法上,我们利用Librosa提取综合音频特征,并应用包括标准RNN、双向RNN及长短期记忆网络在内的多种循环神经网络架构。初步实验基于包含900个音频片段的数据集开展,这些片段已按情感象限标注。我们将神经网络模型与一组基线分类器进行性能对比,并分析其在捕捉音乐表达中固有时间动态方面的有效性。结果表明,对于较小数据集,简单RNN架构的表现可能不亚于甚至优于更复杂的模型。我们还在更大规模数据集上进行了后续实验:其一基于原始数据集进行数据增强,其二来源于其他来源。本研究不仅加深了对音乐情感影响机制的理解,也展示了神经网络在构建更具个性化和情感共鸣的音乐推荐与治疗系统中的潜力。