Optical character recognition (OCR) has advanced rapidly with deep learning and multimodal models, yet most methods focus on well-resourced scripts such as Latin and Chinese. Ethnic minority languages remain underexplored due to complex writing systems, scarce annotations, and diverse historical and modern forms, making generalization in low-resource or zero-shot settings challenging. To address these challenges, we present OmniOCR, a universal framework for ethnic minority scripts. OmniOCR introduces Dynamic Low-Rank Adaptation (Dynamic LoRA) to allocate model capacity across layers and scripts, enabling effective adaptation while preserving knowledge.A sparsity regularization prunes redundant updates, ensuring compact and efficient adaptation without extra inference cost. Evaluations on TibetanMNIST, Shui, ancient Yi, and Dongba show that OmniOCR outperforms zero-shot foundation models and standard post training, achieving state-of-the-art accuracy with superior parameter efficiency, and compared with the state-of-the-art baseline models, it improves accuracy by 39%-66% on these four datasets. Code: https://github.com/AIGeeksGroup/OmniOCR.
翻译:光学字符识别(OCR)技术随着深度学习和多模态模型的发展取得了快速进步,然而现有方法大多集中于拉丁文和中文等高资源文字。少数民族语言因其复杂的书写系统、稀缺的标注数据以及多样化的历史与现代形式,在低资源或零样本场景下的泛化能力仍面临巨大挑战。为应对这些挑战,本文提出OmniOCR——一个面向少数民族文字的通用识别框架。该框架引入动态低秩自适应方法,通过在模型层级与文字类型间动态分配模型容量,在保持已有知识的同时实现高效适配。稀疏正则化机制对冗余参数更新进行剪枝,确保适配过程紧凑高效且不引入额外推理开销。在TibetanMNIST、水书、古彝文及东巴文数据集上的实验表明,OmniOCR在零样本基础模型与标准后训练方法中均取得最优性能,以卓越的参数效率达到最先进的识别准确率。相较于现有基线模型,本方法在四个数据集上的准确率提升幅度达39%-66%。代码已开源:https://github.com/AIGeeksGroup/OmniOCR。