In the era of data-driven Music Information Retrieval (MIR), the scarcity of labeled data has been one of the major concerns to the success of an MIR task. In this work, we leverage the semi-supervised teacher-student training approach to improve MIR tasks. For training, we scale up the unlabeled music data to 240k hours, which is much larger than any public MIR datasets. We iteratively create and refine the pseudo-labels in the noisy teacher-student training process. Knowledge expansion is also explored to iteratively scale up the model sizes from as small as less than 3M to almost 100M parameters. We study the performance correlation between data size and model size in the experiments. By scaling up both model size and training data, our models achieve state-of-the-art results on several MIR tasks compared to models that are either trained in a supervised manner or based on a self-supervised pretrained model. To our knowledge, this is the first attempt to study the effects of scaling up both model and training data for a variety of MIR tasks.
翻译:在数据驱动的音乐信息检索(MIR)领域,标注数据的稀缺性始终是制约任务成功的关键难题之一。本研究采用基于半监督学习的师生训练框架,旨在提升多项MIR任务的性能。训练过程中,我们将未标注音乐数据规模扩展至24万小时,远超现有公开MIR数据集的体量。通过迭代式生成与修正含噪声的师生训练伪标签,我们进一步探索知识扩张机制,将模型参数量从300万以下逐步扩展至近1亿。实验系统性地分析了数据规模与模型规模之间的性能关联。通过同步扩展模型规模与训练数据量,我们的模型在多项MIR任务中取得了超越监督学习模型及基于自监督预训练模型的最佳性能。据我们所知,这是首个针对多种MIR任务系统研究模型规模与训练数据协同扩展效果的工作。