ReconVAT: A Semi-Supervised Automatic Music Transcription Framework for Low-Resource Real-World Data

Most of the current supervised automatic music transcription (AMT) models lack the ability to generalize. This means that they have trouble transcribing real-world music recordings from diverse musical genres that are not presented in the labelled training data. In this paper, we propose a semi-supervised framework, ReconVAT, which solves this issue by leveraging the huge amount of available unlabelled music recordings. The proposed ReconVAT uses reconstruction loss and virtual adversarial training. When combined with existing U-net models for AMT, ReconVAT achieves competitive results on common benchmark datasets such as MAPS and MusicNet. For example, in the few-shot setting for the string part version of MusicNet, ReconVAT achieves F1-scores of 61.0% and 41.6% for the note-wise and note-with-offset-wise metrics respectively, which translates into an improvement of 22.2% and 62.5% compared to the supervised baseline model. Our proposed framework also demonstrates the potential of continual learning on new data, which could be useful in real-world applications whereby new data is constantly available.

翻译：目前大多数受监督的自动音乐转录(AMT)模式都缺乏普及能力。这意味着它们难以翻译来自不同音乐流流的、在标签培训数据中没有显示的音乐流流的真实世界音乐记录。在本文中,我们提出了一个半监督框架,即ReconVAT, 它通过利用现有大量无标签音乐录音来解决这个问题。拟议的ReconVAT使用重建损失和虚拟对抗性培训。当与现有的AMT的U-net模型相结合时,ReconVAT在诸如MAPS和MusicNet等通用基准数据集上取得了竞争性结果。例如,在MusicNet的字符串部分的几张设置中,ReconVAT在备注和备注中分别实现了61.0%和41.6%的F1分数, 这相当于与受监督的基准模型相比的22.2%和62.5%的改进率。我们提议的框架还展示了不断学习新数据的潜力,这在现实应用中可能有用,从而不断获得新的数据。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

最新《自监督表示学习》报告，70页ppt

专知会员服务

86+阅读 · 2020年12月22日