Image quality assessment (IQA) is standard practice in the development stage of novel machine learning algorithms that operate on images. The most commonly used IQA measures have been developed and tested for natural images, but not in the medical setting. Reported inconsistencies arising in medical images are not surprising, as they have different properties than natural images. In this study, we test the applicability of common IQA measures for medical image data by comparing their assessment to manually rated chest X-ray (5 experts) and photoacoustic image data (1 expert). Moreover, we include supplementary studies on grayscale natural images and accelerated brain MRI data. The results of all experiments show a similar outcome in line with previous findings for medical imaging: PSNR and SSIM in the default setting are in the lower range of the result list and HaarPSI outperforms the other tested measures in the overall performance. Also among the top performers in our medical experiments are the full reference measures DISTS, FSIM, LPIPS and MS-SSIM. Generally, the results on natural images yield considerably higher correlations, suggesting that the additional employment of tailored IQA measures for medical imaging algorithms is needed.
翻译:图像质量评估(IQA)是开发基于图像的新型机器学习算法时的标准流程。最常用的IQA指标是针对自然图像开发和测试的,而非针对医学场景。由于医学图像与自然图像具有不同的特性,已报道的医学图像评估结果不一致现象并不令人意外。本研究通过将常见IQA指标的评估结果与人工评分(5位专家标注的胸部X光图像及1位专家标注的光声图像)进行对比,检验了这些指标在医学图像数据中的适用性。此外,我们还补充了针对灰度自然图像和加速脑部MRI数据的实验。所有实验结果显示出一致趋势,与先前医学影像研究的结论相符:默认设置下的PSNR和SSIM在结果列表中排名靠后,而HaarPSI在综合性能上优于其他测试指标。在我们的医学实验中表现优异的全参考指标还包括DISTS、FSIM、LPIPS和MS-SSIM。总体而言,自然图像的评估结果呈现出显著更高的相关性,这表明有必要为医学影像算法开发专用的IQA指标。