Optical Music Recognition is a field that has progressed significantly, bringing accurate systems that transcribe effectively music scores into digital formats. Despite this, there are still several limitations that hinder OMR from achieving its full potential. Specifically, state of the art OMR still depends on multi-stage pipelines for performing full-page transcription, as well as it has only been demonstrated in monophonic cases, leaving behind very relevant engravings. In this work, we present the Sheet Music Transformer++, an end-to-end model that is able to transcribe full-page polyphonic music scores without the need of a previous Layout Analysis step. This is done thanks to an extensive curriculum learning-based pretraining with synthetic data generation. We conduct several experiments on a full-page extension of a public polyphonic transcription dataset. The experimental outcomes confirm that the model is competent at transcribing full-page pianoform scores, marking a noteworthy milestone in end-to-end OMR transcription.
翻译:光学乐谱识别领域已取得显著进展,实现了将乐谱有效转录为数字格式的精确系统。尽管如此,该技术仍存在若干限制其发挥全部潜力的瓶颈。具体而言,当前最先进的OMR系统仍需依赖多阶段流程进行全页转录,且仅被证明适用于单声部场景,这导致大量重要乐谱类型未被覆盖。本工作中,我们提出乐谱Transformer++——一种无需预先进行版面分析即可转录全页复调乐谱的端到端模型。这一突破得益于基于课程学习的预训练方法与合成数据生成相结合。我们在公开复调节谱数据集的扩展全页版本上开展多项实验。实验结果证实,该模型具备转录全页钢琴谱的能力,标志着端到端OMR转录领域的重要里程碑。