Music source separation (MSS) faces challenges due to the limited availability of correctly-labeled individual instrument tracks. With the push to acquire larger datasets to improve MSS performance, the inevitability of encountering mislabeled individual instrument tracks becomes a significant challenge to address. This paper introduces an automated technique for refining the labels in a partially mislabeled dataset. Our proposed self-refining technique, employed with a noisy-labeled dataset, results in only a 1% accuracy degradation in multi-label instrument recognition compared to a classifier trained on a clean-labeled dataset. The study demonstrates the importance of refining noisy-labeled data in MSS model training and shows that utilizing the refined dataset leads to comparable results derived from a clean-labeled dataset. Notably, upon only access to a noisy dataset, MSS models trained on a self-refined dataset even outperform those trained on a dataset refined with a classifier trained on clean labels.
翻译:音乐源分离(MSS)面临因正确标注的独立乐器音轨数据稀缺所带来的挑战。在通过扩大数据集提升MSS性能的驱动下,不可避免地会遭遇标注错误的独立乐器音轨问题,这成为一项亟待解决的重要挑战。本文提出了一种自动化技术,用于优化部分标注错误的数据集标签。我们提出的自优化技术应用于含噪声标注数据集时,相较于在纯净标注数据集上训练的分类器,多标签乐器识别的准确率仅下降1%。研究表明,在MSS模型训练中优化含噪声标注数据具有关键意义,且使用优化后的数据集可获得与纯净标注数据集相当的训练效果。值得注意的是,在仅能获取含噪声数据集的情况下,基于自优化数据集训练的MSS模型甚至优于使用经纯净标注分类器优化数据集训练的模型。