This research focuses on developing a method for restoring the topology of digital images of paper documents captured by a camera, using algorithms for detection, segmentation, geometry restoration, and dewarping. Our methodology employs deep learning (DL) for document outline detection, followed by computer vision (CV) to create a topological 2D grid using cubic polynomial interpolation and correct nonlinear distortions by remapping the image. Using classical CV methods makes the document topology restoration process more efficient and faster, as it requires significantly fewer computational resources and memory. We developed a new pipeline for automatic document dewarping and reconstruction, along with a framework and annotated dataset to demonstrate its efficiency. Our experiments confirm the promise of our methodology and its superiority over existing benchmarks (including mobile apps and popular DL solutions, such as RectiNet, DocGeoNet, and DocTr++) both visually and in terms of document readability via Optical Character Recognition (OCR) and geometry restoration metrics. This paves the way for creating high-quality digital copies of paper documents and enhancing the efficiency of OCR systems. Project page: https://github.com/HorizonParadox/DRCCBI
翻译:本研究致力于开发一种方法,用于恢复相机拍摄的纸质文档数字图像的拓扑结构,该方法综合运用了检测、分割、几何恢复和去扭曲算法。我们的方法采用深度学习(DL)进行文档轮廓检测,随后利用计算机视觉(CV)技术,通过三次多项式插值构建拓扑二维网格,并通过图像重映射来校正非线性畸变。使用经典CV方法使得文档拓扑恢复过程更加高效和快速,因为它所需的计算资源和内存显著减少。我们开发了一种用于自动文档去扭曲与重建的新流程,并提供了一个框架和带标注的数据集以证明其效率。我们的实验证实了该方法的潜力及其相对于现有基准(包括移动应用程序和流行的DL解决方案,如RectiNet、DocGeoNet和DocTr++)的优越性,无论是在视觉上,还是通过光学字符识别(OCR)和几何恢复指标衡量的文档可读性方面。这为创建高质量的纸质文档数字副本以及提升OCR系统效率铺平了道路。项目页面:https://github.com/HorizonParadox/DRCCBI