This paper discusses the challenges of optical character recognition (OCR) on natural scenes, which is harder than OCR on documents due to the wild content and various image backgrounds. We propose to uniformly use word error rates (WER) as a new measurement for evaluating scene-text OCR, both end-to-end (e2e) performance and individual system component performances. Particularly for the e2e metric, we name it DISGO WER as it considers Deletion, Insertion, Substitution, and Grouping/Ordering errors. Finally we propose to utilize the concept of super blocks to automatically compute BLEU scores for e2e OCR machine translation. The small SCUT public test set is used to demonstrate WER performance by a modularized OCR system.
翻译:本文讨论了自然场景中光学字符识别(OCR)面临的挑战,由于内容的复杂性和多样的图像背景,其难度高于文档OCR。我们提出统一使用词错误率(WER)作为评估场景文本OCR的新指标,既可评估端到端(e2e)性能,也可评估各系统组件性能。特别地,针对端到端指标,我们将其命名为DISGO WER,因为它考虑了删除、插入、替换以及分组/排序错误。最后,我们提出利用超块概念自动计算端到端OCR机器翻译的BLEU分数。通过使用小型SCUT公开测试集,我们展示了模块化OCR系统的WER性能。