In this study, we evaluate the performance of multiple state-of-the-art SR GAN (Super Resolution Generative Adversarial Network) models, ESRGAN, Real-ESRGAN and EDSR, on a benchmark dataset of real-world images which undergo degradation using a pipeline. Our results show that some models seem to significantly increase the resolution of the input images while preserving their visual quality, this is assessed using Tesseract OCR engine. We observe that EDSR-BASE model from huggingface outperforms the remaining candidate models in terms of both quantitative metrics and subjective visual quality assessments with least compute overhead. Specifically, EDSR generates images with higher peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) values and are seen to return high quality OCR results with Tesseract OCR engine. These findings suggest that EDSR is a robust and effective approach for single-image super-resolution and may be particularly well-suited for applications where high-quality visual fidelity is critical and optimized compute.
翻译:在本研究中,我们在一个基准真实图像数据集上评估了多种最先进的SR-GAN(超分辨率生成对抗网络)模型,包括ESRGAN、Real-ESRGAN以及EDSR,这些图像通过特定pipeline进行退化处理。我们的结果表明,某些模型能够在显著提升输入图像分辨率的同时保持其视觉质量,这一效果通过Tesseract OCR引擎进行了评估。我们发现,来自huggingface的EDSR-BASE模型在定量指标、主观视觉质量评估以及最低计算开销方面均优于其他候选模型。具体而言,EDSR生成的图像具有更高的峰值信噪比(PSNR)和结构相似性指数(SSIM)值,并且经Tesseract OCR引擎检测可返回高质量的OCR结果。这些发现表明,EDSR是一种稳健且有效的单图像超分辨率方法,尤其适用于对高质量视觉保真度和优化计算有严格要求的应用场景。