Music auto-tagging is crucial for enhancing music discovery and recommendation. Existing models in Music Information Retrieval (MIR) struggle with real-world noise such as environmental and speech sounds in multimedia content. This study proposes a method inspired by speech-related tasks to enhance music auto-tagging performance in noisy settings. The approach integrates Domain Adversarial Training (DAT) into the music domain, enabling robust music representations that withstand noise. Unlike previous research, this approach involves an additional pretraining phase for the domain classifier, to avoid performance degradation in the subsequent phase. Adding various synthesized noisy music data improves the model's generalization across different noise levels. The proposed architecture demonstrates enhanced performance in music auto-tagging by effectively utilizing unlabeled noisy music data. Additional experiments with supplementary unlabeled data further improves the model's performance, underscoring its robust generalization capabilities and broad applicability.
翻译:音乐自动标注对于提升音乐发现与推荐至关重要。现有音乐信息检索(MIR)模型在处理多媒体内容中的环境音、语音等现实噪声时面临挑战。本研究受语音相关任务启发,提出一种在噪声环境下增强音乐自动标注性能的方法。该方法将领域对抗训练(DAT)引入音乐领域,可生成抵抗噪声的鲁棒音乐表示。与现有研究不同,本方法为领域分类器增设预训练阶段,避免后续阶段性能退化。通过添加多种合成含噪音乐数据,提升了模型对不同噪声水平的泛化能力。所提架构通过有效利用无标注含噪音乐数据,在音乐自动标注任务中展现出增强的性能。额外使用未标注数据进行补充实验进一步提升了模型性能,充分验证了其强大的泛化能力与广泛适用性。