In the realm of research, the detection/recognition of text within images/videos captured by cameras constitutes a highly challenging problem for researchers. Despite certain advancements achieving high accuracy, current methods still require substantial improvements to be applicable in practical scenarios. Diverging from text detection in images/videos, this paper addresses the issue of text detection within license plates by amalgamating multiple frames of distinct perspectives. For each viewpoint, the proposed method extracts descriptive features characterizing the text components of the license plate, specifically corner points and area. Concretely, we present three viewpoints: view-1, view-2, and view-3, to identify the nearest neighboring components facilitating the restoration of text components from the same license plate line based on estimations of similarity levels and distance metrics. Subsequently, we employ the CnOCR method for text recognition within license plates. Experimental results on the self-collected dataset (PTITPlates), comprising pairs of images in various scenarios, and the publicly available Stanford Cars Dataset, demonstrate the superiority of the proposed method over existing approaches.
翻译:在图像/视频中的文本检测与识别领域,摄像机拍摄的图像/视频中的文本检测与识别对研究者而言是一个极具挑战性的问题。尽管已有方法在准确率上取得了一定进展,但在实际应用场景中仍需大幅改进。不同于图像/视频中的文本检测,本文通过融合多个不同视角的图像帧来解决车牌文本检测问题。对于每个视角,所提方法提取表征车牌文本组件特征的关键描述符,具体包括角点和区域特征。具体而言,我们提出三个视角(view-1、view-2和view-3),通过基于相似度水平和距离度量的估计,识别最近邻组件,以恢复同一车牌行中的文本组件。随后,采用CnOCR方法进行车牌文本识别。在自建数据集(PTITPlates,包含多种场景下的成对图像)以及公开的Stanford Cars数据集上的实验结果表明,所提方法优于现有方法。