Optical Music Recognition for Real-World Manuscripts with Synthetic Data

Optical Music Recognition (OMR) has seen major progress in model design, with end-to-end methods now capable of recognising notation at all levels of complexity. However, the impact of this progress has been limited by the visual domains of available training datasets, which are largely born-digital. Existing large collections of sheet music in libraries and other heritage institutions contain predominantly manuscripts, whose visual domains are highly diverse and different, so existing OMR systems fail when applied in the real world. These institutions are often resource-constrained, so large in-domain datasets cannot be expected. We provide a first baseline on real-world manuscripts with complex piano notation in the resource-constrained scenario. Using fine-grained music notation graph (MuNG) annotations and the Smashcima synthesis tool, we then show that while some direct transcriptions of in-domain data remain essential, domain adaptation using synthetic musical manuscript images brings significant improvement. Furthermore, the symbols used do not need to be in-domain, so the expensive fine-grained annotation can be avoided. We thus bring OMR closer to one of its stated goals: preserving and promoting musical cultural heritage.

翻译：光学音乐识别（OMR）在模型设计方面取得了重大进展，端到端方法现已能够识别所有复杂程度的乐谱符号。然而，这一进展的影响受到可获取训练数据集视觉领域的限制——现有数据集主要源自数字生成。图书馆及其他文化遗产机构中现存的大量乐谱集以手稿为主，其视觉领域高度多样且迥异，导致现有OMR系统在真实世界应用中失效。这些机构通常资源受限，因此难以构建大规模领域内数据集。我们针对资源受限场景下包含复杂钢琴谱记的真实世界手稿，提供了首个基准。通过利用细粒度音乐符号图（MuNG）标注和Smashcima合成工具，我们证明尽管部分领域内数据的直接转录仍然不可或缺，但基于合成音乐手稿图像的领域适应能带来显著性能提升。此外，所用符号无需局限于领域内数据，从而可避免成本高昂的细粒度标注。由此，我们使OMR更接近其既定目标之一：保护与推广音乐文化遗产。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【博士论文】面向真实世界音视联合语音识别的可扩展框架

专知会员服务

13+阅读 · 2025年12月19日

【博士论文】提高预训练文本生成音乐模型的可控性和可编辑性

专知会员服务

17+阅读 · 2024年11月20日

【MIT博士论文】合成数据的视觉表示学习

专知会员服务

27+阅读 · 2024年8月25日

知识图谱与大模型融合综述

专知会员服务

120+阅读 · 2024年6月30日