Sentiment or mood can express themselves on various levels in music. In automatic analysis, the actual audio data is usually analyzed, but the lyrics can also play a crucial role in the perception of moods. We first evaluate various models for sentiment analysis based on lyrics and audio separately. The corresponding approaches already show satisfactory results, but they also exhibit weaknesses, the causes of which we examine in more detail. Furthermore, different approaches to combining the audio and lyrics results are proposed and evaluated. Considering both modalities generally leads to improved performance. We investigate misclassifications and (also intentional) contradictions between audio and lyrics sentiment more closely, and identify possible causes. Finally, we address fundamental problems in this research area, such as high subjectivity, lack of data, and inconsistency in emotion taxonomies.
翻译:情感或情绪在音乐中可在多个层面得以表达。在自动分析中,通常对实际音频数据进行分析,但歌词在情绪感知中也起着关键作用。我们首先评估了分别基于歌词和音频的情感分析模型。相应方法已展现出令人满意的结果,但也存在不足,我们对其成因进行了更详细的探究。此外,我们提出并评估了多种结合音频与歌词结果的方法。综合两种模态通常能提升性能。我们深入研究了音频与歌词情感之间的误分类及(包括有意的)矛盾现象,并识别了可能的原因。最后,我们探讨了该研究领域的根本性问题,如高度主观性、数据匮乏以及情绪分类体系的不一致性。