In this study, we evaluate the performance of multiple state-of-the-art SRGAN (Super Resolution Generative Adversarial Network) models, ESRGAN, Real-ESRGAN and EDSR, on a benchmark dataset of real-world images which undergo degradation using a pipeline. Our results show that some models seem to significantly increase the resolution of the input images while preserving their visual quality, this is assessed using Tesseract OCR engine. We observe that EDSR-BASE model from huggingface outperforms the remaining candidate models in terms of both quantitative metrics and subjective visual quality assessments with least compute overhead. Specifically, EDSR generates images with higher peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) values and are seen to return high quality OCR results with Tesseract OCR engine. These findings suggest that EDSR is a robust and effective approach for single-image super-resolution and may be particularly well-suited for applications where high-quality visual fidelity is critical and optimized compute.
翻译:本研究评估了多种先进的SRGAN(超分辨率生成对抗网络)模型,包括ESRGAN、Real-ESRGAN和EDSR,在使用退化流水线处理真实世界图像的基准数据集上的性能。结果表明,部分模型在显著提升输入图像分辨率的同时保持了视觉质量,这一性能通过Tesseract OCR引擎进行评估。我们发现,来自Huggingface的EDSR-BASE模型在定量指标、主观视觉质量评估和最低计算开销方面均优于其他候选模型。具体而言,EDSR生成的图像具有更高的峰值信噪比(PSNR)和结构相似性指数(SSIM),且通过Tesseract OCR引擎获得了高质量的OCR结果。这些发现表明,EDSR是一种鲁棒且有效的单图像超分辨率方法,尤其适用于需要高质量视觉保真度和优化计算的场景。