An automatic document classification system is presented that detects textual content in images and classifies documents into four predefined categories (Invoice, Report, Letter, and Form). The system supports both offline images (e.g., files on flash drives, HDDs, microSD) and real-time capture via connected cameras, and is designed to mitigate practical challenges such as variable illumination, arbitrary orientation, curved or partially occluded text, low resolution, and distant text. The pipeline comprises four stages: image capture and preprocessing, text detection [1] using a DBNet++ (Differentiable Binarization Network Plus) detector, and text classification [2] using a BART (Bidirectional and Auto-Regressive Transformers) classifier, all integrated within a user interface implemented in Python with PyQt5. The achieved results by the system for text detection in images were good at about 92.88% through 10 hours on Total-Text dataset that involve high resolution images simulate a various and very difficult challenges. The results indicate the proposed approach is effective for practical, mixed-source document categorization in unconstrained imaging scenarios.
翻译:本文提出一种自动文档分类系统,该系统可检测图像中的文本内容,并将文档归类至四个预定义类别(发票、报告、信件及表格)。该系统同时支持离线图像(如存储于U盘、硬盘、微型SD卡中的文件)与通过连接摄像头实现的实时采集,旨在应对实际应用中诸如光照变化、任意方向、弯曲或部分遮挡文本、低分辨率及远距离文本等挑战。处理流程包含四个阶段:图像采集与预处理、基于DBNet++(可微分二值化网络增强版)检测器的文本检测[1],以及基于BART(双向自回归变换器)分类器的文本分类[2],所有模块均集成于以Python与PyQt5实现的用户界面中。系统在包含高分辨率图像且模拟多样高难度挑战的Total-Text数据集上经过10小时测试,其图像文本检测结果达到约92.88%的良好性能。实验结果表明,所提方法在非受限成像场景下对多源混合文档的分类具有实际应用有效性。