Self-refining of Pseudo Labels for Music Source Separation with Noisy Labeled Data

Music source separation (MSS) faces challenges due to the limited availability of correctly-labeled individual instrument tracks. With the push to acquire larger datasets to improve MSS performance, the inevitability of encountering mislabeled individual instrument tracks becomes a significant challenge to address. This paper introduces an automated technique for refining the labels in a partially mislabeled dataset. Our proposed self-refining technique, employed with a noisy-labeled dataset, results in only a 1% accuracy degradation in multi-label instrument recognition compared to a classifier trained on a clean-labeled dataset. The study demonstrates the importance of refining noisy-labeled data in MSS model training and shows that utilizing the refined dataset leads to comparable results derived from a clean-labeled dataset. Notably, upon only access to a noisy dataset, MSS models trained on a self-refined dataset even outperform those trained on a dataset refined with a classifier trained on clean labels.

翻译：音乐源分离（MSS）面临因正确标注的独立乐器音轨数据稀缺所带来的挑战。在通过扩大数据集提升MSS性能的驱动下，不可避免地会遭遇标注错误的独立乐器音轨问题，这成为一项亟待解决的重要挑战。本文提出了一种自动化技术，用于优化部分标注错误的数据集标签。我们提出的自优化技术应用于含噪声标注数据集时，相较于在纯净标注数据集上训练的分类器，多标签乐器识别的准确率仅下降1%。研究表明，在MSS模型训练中优化含噪声标注数据具有关键意义，且使用优化后的数据集可获得与纯净标注数据集相当的训练效果。值得注意的是，在仅能获取含噪声数据集的情况下，基于自优化数据集训练的MSS模型甚至优于使用经纯净标注分类器优化数据集训练的模型。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日