Handwritten Mathematical Expression Recognition (HMER) requires reasoning over diverse symbols and 2D structural layouts, yet autoregressive models struggle with exposure bias and syntactic inconsistency. We present a discrete diffusion framework that reformulates HMER as iterative symbolic refinement instead of sequential generation. Through multi-step remasking, the proposal progressively refines both symbols and structural relations, removing causal dependencies and improving structural consistency. A symbol-aware tokenization and Random-Masking Mutual Learning further enhance syntactic alignment and robustness to handwriting diversity. On the MathWriting benchmark, the proposal achieves 5.56\% CER and 60.42\% EM, outperforming strong Transformer and commercial baselines. Consistent gains on CROHME 2014--2023 demonstrate that discrete diffusion provides a new paradigm for structure-aware visual recognition beyond generative modeling.
翻译:手写数学表达式识别需要推理多样化的符号和二维结构布局,然而自回归模型存在曝光偏差和句法不一致问题。本文提出一种离散扩散框架,将HMER重新定义为迭代式符号优化而非序列生成。通过多步重掩码机制,该方法逐步优化符号与结构关系,消除因果依赖性并提升结构一致性。符号感知分词技术与随机掩码互学习策略进一步增强了句法对齐能力及对手写多样性的鲁棒性。在MathWriting基准测试中,该方法实现了5.56%的字符错误率和60.42%的精确匹配率,优于强Transformer基线及商业系统。在CROHME 2014-2019数据集上的持续性能提升表明,离散扩散为超越生成建模的结构感知视觉识别提供了新范式。