Discovering Limitations of Image Quality Assessments with Noised Deep Learning Image Sets

Image quality is important, and can affect overall performance in image processing and computer vision as well as for numerous other reasons. Image quality assessment (IQA) is consequently a vital task in different applications from aerial photography interpretation to object detection to medical image analysis. In previous research, the BRISQUE algorithm and the PSNR algorithm were evaluated with high resolution (atleast 512x384 pixels), but relatively small image sets (no more than 4,744 images). However, scientists have not evaluated IQA algorithms on low resolution (no more than 32x32 pixels), multi-perturbation, big image sets (for example, tleast 60,000 different images not counting their perturbations). This study explores these two IQA algorithms through experimental investigation. We first chose two deep learning image sets, CIFAR-10 and MNIST. Then, we added 68 perturbations that add noise to the images in specific sequences and noise intensities. In addition, we tracked the performance outputs of the two IQA algorithms with singly and multiply noised images. After quantitatively analyzing experimental results, we report the limitations of the two IQAs with these noised CIFAR-10 and MNIST image sets. We also explain three potential root causes for performance degradation. These findings point out weaknesses of the two IQA algorithms. The research results provide guidance to scientists and engineers developing accurate, robust IQA algorithms. All source codes, related image sets, and figures are shared on the website (https://github.com/caperock/imagequality) to support future scientific and industrial projects.

翻译：图像质量至关重要，它不仅影响图像处理和计算机视觉的整体性能，还涉及众多其他方面。因此，图像质量评估在不同应用中——从航空影像解译、目标检测到医学图像分析——都是一项关键任务。先前的研究中，BRISQUE算法和PSNR算法已在高分辨率（至少512x384像素）但相对较小的图像集（不超过4744张图像）上进行了评估。然而，科学家尚未在低分辨率（不超过32x32像素）、多重扰动、大规模图像集（例如至少60000张不同图像，不计扰动）上评估图像质量评估算法。本研究通过实验探讨了这两种图像质量评估算法。我们首先选取了两个深度学习图像集：CIFAR-10和MNIST。随后，我们添加了68种扰动，以特定序列和噪声强度对图像引入噪声。此外，我们追踪了两种图像质量评估算法在单次和多次噪声图像上的性能输出。通过定量分析实验结果，我们报告了这两种图像质量评估算法在这些噪声CIFAR-10和MNIST图像集上的局限性。同时，我们解释了性能下降的三个潜在根本原因。这些发现指出了两种图像质量评估算法的弱点。研究结果为开发精确、鲁棒的图像质量评估算法的科学家和工程师提供了指导。所有源代码、相关图像集和图表均已分享在网站（https://github.com/caperock/imagequality）上，以支持未来的科学和工业项目。