Music classification has been one of the most popular tasks in the field of music information retrieval. With the development of deep learning models, the last decade has seen impressive improvements in a wide range of classification tasks. However, the increasing model complexity makes both training and inference computationally expensive. In this paper, we integrate the ideas of transfer learning and feature-based knowledge distillation and systematically investigate using pre-trained audio embeddings as teachers to guide the training of low-complexity student networks. By regularizing the feature space of the student networks with the pre-trained embeddings, the knowledge in the teacher embeddings can be transferred to the students. We use various pre-trained audio embeddings and test the effectiveness of the method on the tasks of musical instrument classification and music auto-tagging. Results show that our method significantly improves the results in comparison to the identical model trained without the teacher's knowledge. This technique can also be combined with classical knowledge distillation approaches to further improve the model's performance.
翻译:音乐分类一直是音乐信息检索领域最受欢迎的任务之一。随着深度学习模型的发展,过去十年中各类分类任务取得了显著进步。然而,日益增长的模型复杂度使得训练和推理的计算成本居高不下。本文融合了迁移学习和基于特征的知识蒸馏思想,系统性地研究了利用预训练音频嵌入作为教师来指导低复杂度学生网络的训练过程。通过使用预训练嵌入对学生网络的特征空间进行正则化,教师嵌入中的知识得以传递至学生网络。我们采用多种预训练音频嵌入,并在乐器分类和音乐自动标注任务上验证了该方法的有效性。结果表明,与未使用教师知识的相同模型相比,本方法显著提升了结果。该技术还可与经典知识蒸馏方法结合,进一步改善模型性能。