While automatic music transcription is well-established in music information retrieval, most models are limited to transcribing pitch and timing information from audio, and thus omit crucial expressive and instrument-specific nuances. One example is playing technique on the violin, which affords its distinct palette of timbres for maximal emotional impact. Here, we propose VioPTT (Violin Playing Technique-aware Transcription), a lightweight cascade model that directly transcribes violin playing technique in addition to pitch onset and offset. Furthermore, we release MOSA-VPT, a novel, high-quality synthetic violin playing technique dataset to circumvent the need for manually labeled annotations. Leveraging this dataset, our model demonstrated strong generalization to real-world note-level violin technique recordings in addition to achieving state-of-the-art transcription performance. To our knowledge, VioPTT is the first to jointly combine violin transcription and playing technique prediction within a unified framework.
翻译:尽管自动音乐转录在音乐信息检索领域已较为成熟,但大多数模型仅限于从音频中转录音高和时序信息,因而忽略了关键的表达性及乐器特有的细微差别。小提琴演奏技法便是一例,其提供了独特的音色调色板以实现最大的情感冲击力。本文提出VioPTT(小提琴演奏技法感知转录模型),一种轻量级级联模型,能够直接转录小提琴演奏技法以及音高起始与偏移。此外,我们发布了MOSA-VPT——一个新颖、高质量的合成小提琴演奏技法数据集,以规避手动标注的需求。利用该数据集,我们的模型在实现最先进转录性能的同时,对真实世界的音符级小提琴技法录音也展现出强大的泛化能力。据我们所知,VioPTT是首个在统一框架内联合实现小提琴转录与演奏技法预测的模型。