This paper presents Diffusion Model for Scene Text Recognition (DiffusionSTR), an end-to-end text recognition framework using diffusion models for recognizing text in the wild. While existing studies have viewed the scene text recognition task as an image-to-text transformation, we rethought it as a text-text one under images in a diffusion model. We show for the first time that the diffusion model can be applied to text recognition. Furthermore, experimental results on publicly available datasets show that the proposed method achieves competitive accuracy compared to state-of-the-art methods.
翻译:本文提出面向场景文本识别的扩散模型(DiffusionSTR),这是一种利用扩散模型在自然场景中识别文本的端到端文本识别框架。现有研究通常将场景文本识别任务视为图像到文本的转换,而我们则将其重新构想为在图像条件下的文本到文本转换。我们首次证明了扩散模型可应用于文本识别。此外,在公开数据集上的实验结果表明,与现有最优方法相比,所提方法取得了具有竞争力的识别精度。