Optical Music Recognition is a field that has progressed significantly, bringing accurate systems that transcribe effectively music scores into digital formats. Despite this, there are still several limitations that hinder OMR from achieving its full potential. Specifically, state of the art OMR still depends on multi-stage pipelines for performing full-page transcription, as well as it has only been demonstrated in monophonic cases, leaving behind very relevant engravings. In this work, we present the Sheet Music Transformer++, an end-to-end model that is able to transcribe full-page polyphonic music scores without the need of a previous Layout Analysis step. This is done thanks to an extensive curriculum learning-based pretraining with synthetic data generation. We conduct several experiments on a full-page extension of a public polyphonic transcription dataset. The experimental outcomes confirm that the model is competent at transcribing full-page pianoform scores, marking a noteworthy milestone in end-to-end OMR transcription.
翻译:光学音乐识别领域已取得显著进展,催生了能将乐谱高效转录为数字格式的精准系统。尽管如此,当前仍存在若干制约光学音乐识别完全发挥潜力的局限:最先进的光学音乐识别技术仍依赖多阶段流水线实现全页转录,且仅在单声部案例中得到验证,忽略了大量相关乐谱刻印。本研究提出乐谱Transformer++这一端到端模型,无需预先进行版面分析即可完成全页多声部乐谱转录。这得益于基于课程学习的合成数据生成预训练策略。我们在公开多声部转录数据集的全页扩展版本上进行了多项实验,结果表明该模型能胜任钢琴形式全页乐谱的转录任务,标志着端到端光学音乐识别领域的重要里程碑。