We propose a new approach for the second stage of a practical two-stage Optical Music Recognition (OMR) pipeline. Given symbol and event candidates from the visual pipeline, we decode them into an editable, verifiable, and exportable score structure. We focus on complex polyphonic staff notation, especially piano scores, where voice separation and intra-measure timing are the main bottlenecks. Our approach formulates second-stage decoding as a structure decoding problem and uses topology recognition with probability-guided search (BeadSolver) as its core method. We also describe a data strategy that combines procedural generation with recognition-feedback annotations. The result is a practical decoding component for real OMR systems and a path to accumulate structured score data for future end-to-end, multimodal, and RL-style methods.
翻译:我们提出了一种实用的两阶段光学音乐识别(OMR)流程中第二阶段的新方法。给定视觉处理阶段输出的符号与事件候选,我们将其解码为可编辑、可验证且可导出的乐谱结构。我们聚焦于复杂的复调线谱(尤其钢琴谱),其中声部分离和小节内时值划分是主要瓶颈。该方法将第二阶段解码形式化为结构解码问题,并采用拓扑识别与概率引导搜索(BeadSolver)作为核心方法。我们还描述了一种结合程序化生成与识别反馈标注的数据策略。最终成果是为真实OMR系统提供的实用解码组件,以及为未来端到端、多模态及强化学习方法积累结构化乐谱数据的可行路径。