We introduce PaddleOCR-VL-1.5, an upgraded model achieving a new state-of-the-art (SOTA) accuracy of 94.5% on OmniDocBench v1.5. To rigorously evaluate robustness against real-world physical distortions, including scanning, skew, warping, screen-photography, and illumination, we propose the Real5-OmniDocBench benchmark. Experimental results demonstrate that this enhanced model attains SOTA performance on the newly curated benchmark. Furthermore, we extend the model's capabilities by incorporating seal recognition and text spotting tasks, while remaining a 0.9B ultra-compact VLM with high efficiency. Code: https://github.com/PaddlePaddle/PaddleOCR
翻译:我们介绍了PaddleOCR-VL-1.5,这是一个升级模型,在OmniDocBench v1.5上实现了94.5%的最新最先进(SOTA)准确率。为了严格评估模型对包括扫描、倾斜、扭曲、屏幕翻拍和光照在内的真实世界物理形变的鲁棒性,我们提出了Real5-OmniDocBench基准测试。实验结果表明,该增强模型在新构建的基准测试上达到了SOTA性能。此外,我们通过集成印章识别和文本定位任务扩展了模型能力,同时保持其作为0.9B超紧凑视觉语言模型的高效性。代码:https://github.com/PaddlePaddle/PaddleOCR