Most computer vision and machine learning-based approaches for historical document analysis are tailored to grayscale or RGB images and thus, mostly exploit their spatial information. Multispectral (MS) and hyperspectral (HS) images contain, next to the spatial information, much richer spectral information than RGB images (usually spreading beyond the visible spectral range) that can facilitate more effective feature extraction, more accurate classification and recognition, and thus, improved analysis. Although utilization of rich spectral information can improve historical document analysis tremendously, there are still some potential limitations of HS imagery such as camera-induced noise and blur that require a carefully designed preprocessing step. Here, we propose novel blind HS image deblurring methods tailored to document images. We exploit a low-rank property of HS images (i.e., by projecting an HS image to a lower dimensional subspace) and utilize a text tailor image prior to performing a PSF estimation and deblurring of subspace components. The preliminary results show that the proposed approach gives good results over all spectral bands, removing successfully image artefacts introduced by blur and noise and significantly increasing the number of bands that can be used in further analysis.
翻译:大多数基于计算机视觉和机器学习的历史文档分析方法都是针对灰度或RGB图像设计的,因此主要利用其空间信息。多光谱(MS)和高光谱(HS)图像除了空间信息外,还包含比RGB图像丰富得多的光谱信息(通常扩展到可见光谱范围之外),这有助于实现更有效的特征提取、更准确的分类与识别,从而改进分析效果。虽然利用丰富的光谱信息能极大提升历史文档分析,但高光谱图像仍存在一些潜在局限性,例如相机引起的噪声和模糊,需要经过精心设计的预处理步骤。本文提出了一种新颖的、专门针对文档图像的盲高光谱图像去模糊方法。我们利用高光谱图像的低秩特性(即将高光谱图像投影到低维子空间),并采用文本专用图像先验信息,进行点扩散函数(PSF)估计和子空间分量的去模糊。初步结果表明,所提方法在所有光谱波段上均取得了良好效果,成功消除了由模糊和噪声引入的图像伪影,并显著增加了可用于进一步分析的有效波段数量。