Automatic music transcription (AMT) has achieved high accuracy for piano due to the availability of large, high-quality datasets such as MAESTRO and MAPS, but comparable datasets are not yet available for other instruments. In recent work, however, it has been demonstrated that aligning scores to transcription model activations can produce high quality AMT training data for instruments other than piano. Focusing on the guitar, we refine this approach to training on score data using a dataset of commercially available score-audio pairs. We propose the use of a high-resolution piano transcription model to train a new guitar transcription model. The resulting model obtains state-of-the-art transcription results on GuitarSet in a zero-shot context, improving on previously published methods.
翻译:自动音乐转录(AMT)因大型高质量数据集(如MAESTRO和MAPS)的可用性,已在钢琴上实现了高精度,但其他乐器尚无此类可比数据集。然而,近期研究表明,将乐谱与转录模型激活特征对齐,可为钢琴以外的乐器生成高质量的AMT训练数据。聚焦于吉他,我们利用商业可获取的乐谱-音频配对数据集,优化了这一基于乐谱数据的训练方法。我们提出利用高分辨率钢琴转录模型来训练新的吉他转录模型。在零样本场景下,该模型在GuitarSet上取得了当前最优的转录结果,相较于已发表方法实现了性能提升。