When handling complicated text images (e.g., irregular structures, low resolution, heavy occlusion, and uneven illumination), existing supervised text recognition methods are data-hungry. Although these methods employ large-scale synthetic text images to reduce the dependence on annotated real images, the domain gap still limits the recognition performance. Therefore, exploring the robust text feature representations on unlabeled real images by self-supervised learning is a good solution. However, existing self-supervised text recognition methods conduct sequence-to-sequence representation learning by roughly splitting the visual features along the horizontal axis, which limits the flexibility of the augmentations, as large geometric-based augmentations may lead to sequence-to-sequence feature inconsistency. Motivated by this, we propose a novel self-supervised Character-to-Character Distillation method, CCD, which enables versatile augmentations to facilitate general text representation learning. Specifically, we delineate the character structures of unlabeled real images by designing a self-supervised character segmentation module. Following this, CCD easily enriches the diversity of local characters while keeping their pairwise alignment under flexible augmentations, using the transformation matrix between two augmented views from images. Experiments demonstrate that CCD achieves state-of-the-art results, with average performance gains of 1.38% in text recognition, 1.7% in text segmentation, 0.24 dB (PSNR) and 0.0321 (SSIM) in text super-resolution. Code will be released soon.
翻译:在处理复杂文本图像(如不规则结构、低分辨率、严重遮挡及光照不均)时,现有监督文本识别方法对数据需求量大。尽管这些方法采用大规模合成文本图像以减少对标注真实图像的依赖,但领域差距仍限制了识别性能。因此,通过自监督学习在无标签真实图像上探索鲁棒的文本特征表示是一种有效的解决方案。然而,现有自监督文本识别方法通过沿水平轴粗略分割视觉特征进行序列到序列表示学习,这限制了增强的灵活性,因为基于几何的大尺度增强可能导致序列到序列特征不一致。受此启发,我们提出一种新颖的自监督字符级蒸馏方法CCD,该方法能够支持多样化增强以促进通用文本表示学习。具体而言,我们通过设计自监督字符分割模块来刻画无标签真实图像的字符结构。在此基础上,CCD利用图像两个增强视图间的变换矩阵,在灵活增强下轻松丰富局部字符多样性,同时保持其逐对对齐。实验表明,CCD在文本识别、文本分割及文本超分辨率任务上分别实现了平均性能提升1.38%、1.7%及0.24 dB(PSNR)和0.0321(SSIM),达到当前最优结果。代码即将开源。