Optical Character Recognition (OCR) is fundamental to Vision-Language Models (VLMs) and high-quality data generation for LLM training. Yet, despite progress in average OCR accuracy, state-of-the-art VLMs still struggle with detecting sample-level errors and lack effective unsupervised quality control. We introduce Consensus Entropy (CE), a training-free, model-agnostic metric that estimates output reliability by measuring inter-model agreement entropy. The core insight is that correct predictions converge in output space, while errors diverge. Based on CE, we develop CE-OCR, a lightweight multi-model framework that verifies outputs by ensemble agreement, selects the best outputs, and further improves efficiency through adaptive routing. Experiments demonstrate that CE is robust for quality verification, improving F1 scores by 42.1% over VLM-as-Judge. CE-OCR achieves consistent OCR gains, outperforming self-consistency and single-model baselines at the same cost. Notably, CE requires no training or supervision, enabling plug-and-play integration. Code: https://github.com/Aslan-yulong/consensus-entropy.
翻译:光学字符识别(OCR)是视觉语言模型(VLM)及大规模语言模型训练中高质量数据生成的基础。然而,尽管平均OCR精度有所提升,最先进的VLM在检测样本级错误方面仍存在困难,且缺乏有效的无监督质量控制。我们提出共识熵(CE),一种免训练、模型无关的度量方法,通过测量模型间一致性熵来估计输出可靠性。其核心洞察在于:正确预测在输出空间中趋于收敛,而错误预测则趋于发散。基于CE,我们开发了CE-OCR,一种轻量级多模型框架,通过集成一致性验证输出、选择最优输出,并借助自适应路由进一步提升效率。实验表明,CE在质量验证中具有鲁棒性,将F1分数较VLM-as-Judge方法提升42.1%。CE-OCR在相同计算成本下实现了稳定的OCR性能提升,优于自一致性及单模型基线。值得注意的是,CE无需训练或监督,可实现即插即用集成。代码:https://github.com/Aslan-yulong/consensus-entropy。