Music Emotion Recognition involves the automatic identification of emotional elements within music tracks, and it has garnered significant attention due to its broad applicability in the field of Music Information Retrieval. It can also be used as the upstream task of many other human-related tasks such as emotional music generation and music recommendation. Due to existing psychology research, music emotion is determined by multiple factors such as the Timbre, Velocity, and Structure of the music. Incorporating multiple factors in MER helps achieve more interpretable and finer-grained methods. However, most prior works were uni-domain and showed weak consistency between arousal modeling performance and valence modeling performance. Based on this background, we designed a multi-domain emotion modeling method for instrumental music that combines symbolic analysis and acoustic analysis. At the same time, because of the rarity of music data and the difficulty of labeling, our multi-domain approach can make full use of limited data. Our approach was implemented and assessed using the publicly available piano dataset EMOPIA, resulting in a notable improvement over our baseline model with a 2.4% increase in overall accuracy, establishing its state-of-the-art performance.
翻译:音乐情感识别涉及自动识别音乐曲目中的情感元素,因其在音乐信息检索领域的广泛适用性而备受关注。它还可作为情感音乐生成和音乐推荐等许多人机交互任务的上游任务。根据现有心理学研究,音乐情感由音色、力度和结构等多重因素共同决定。在多域情感识别中整合多因素有助于实现更具可解释性和更细粒度的方法。然而,以往大多数研究属于单域范畴,在唤醒度建模和效价建模性能之间表现出较弱的一致性。基于此背景,我们设计了一种结合符号分析与声学分析的器乐多域情感建模方法。同时,针对音乐数据稀缺和标注困难的问题,我们的多域方法能够充分利用有限数据。该方法基于公开钢琴数据集EMOPIA进行实现与评估,相较基线模型总体准确率提升2.4%,达到当前最优性能水平。