Piano fingering -- knowing which finger to use to play each note in a musical piece, is a hard and important skill to master when learning to play the piano. While some sheet music is available with expert-annotated fingering information, most pieces lack this information, and people often resort to learning the fingering from demonstrations in online videos. We consider the AI task of automating the extraction of fingering information from videos. This is a non-trivial task as fingers are often occluded by other fingers, and it is often not clear from the video which of the keys were pressed, requiring the synchronization of hand position information and knowledge about the notes that were played. We show how to perform this task with high-accuracy using a combination of deep-learning modules, including a GAN-based approach for fine-tuning on out-of-domain data. We extract the fingering information with an f1 score of 97\%. We run the resulting system on 90 videos, resulting in high-quality piano fingering information of 150K notes, the largest available dataset of piano-fingering to date.
翻译:钢琴指法——即掌握在乐曲中每个音符应使用哪根手指演奏的技巧,是学习钢琴时需要掌握的重要技能。虽然部分乐谱包含专家标注的指法信息,但大多数曲目缺乏此类标注,人们常通过在线教学视频中的示范来学习指法。我们探讨了从视频中自动提取指法信息的人工智能任务。由于手指常被其他手指遮挡,且视频中难以明确识别哪些琴键被按下,这需要同步手部位置信息与已知演奏音符的知识,使得该任务颇具挑战性。我们展示了如何结合深度学习模块(包括基于生成对抗网络的跨域数据微调方法)以高精度完成该任务。最终系统提取指法信息的F1分数达到97%。我们将该系统应用于90个视频,获得了包含15万个音符的高质量钢琴指法数据集,这是迄今为止规模最大的钢琴指法数据集。