Optical Character Recognition (OCR) is essential in applications such as document processing, license plate recognition, and intelligent surveillance. However, existing OCR models often underperform in real-world scenarios due to irregular text layouts, poor image quality, character variability, and high computational costs. This paper introduces SDA-Net (Stroke-Sensitive Attention and Dynamic Context Encoding Network), a lightweight and efficient architecture designed for robust single-character recognition. SDA-Net incorporates: (1) a Dual Attention Mechanism to enhance stroke-level and spatial feature extraction; (2) a Dynamic Context Encoding module that adaptively refines semantic information using a learnable gating mechanism; (3) a U-Net-inspired Feature Fusion Strategy for combining low-level and high-level features; and (4) a highly optimized lightweight backbone that reduces memory and computational demands. Experimental results show that SDA-Net achieves state-of-the-art accuracy on challenging OCR benchmarks, with significantly faster inference, making it well-suited for deployment in real-time and edge-based OCR systems.
翻译:光学字符识别(OCR)在文档处理、车牌识别和智能监控等应用中至关重要。然而,由于不规则的文本布局、较差的图像质量、字符的多样性以及高昂的计算成本,现有的OCR模型在现实场景中往往表现不佳。本文介绍了SDA-Net(笔画敏感注意力与动态上下文编码网络),这是一种专为鲁棒的单字符识别而设计的轻量高效架构。SDA-Net包含:(1)一种双注意力机制,用于增强笔画级和空间特征提取;(2)一个动态上下文编码模块,通过可学习的门控机制自适应地优化语义信息;(3)一种受U-Net启发的特征融合策略,用于结合低级和高级特征;以及(4)一个高度优化的轻量级骨干网络,以降低内存和计算需求。实验结果表明,SDA-Net在具有挑战性的OCR基准测试中达到了最先进的准确率,且推理速度显著更快,使其非常适合部署于实时和基于边缘的OCR系统中。