Image quality assessment (IQA) is standard practice in the development stage of novel machine learning algorithms that operate on images. The most commonly used IQA measures have been developed and tested for natural images, but not in the medical setting. Reported inconsistencies arising in medical images are not surprising, as they have different properties than natural images. In this study, we test the applicability of common IQA measures for medical image data by comparing their assessment to manually rated chest X-ray (5 experts) and photoacoustic image data (2 experts). Moreover, we include supplementary studies on grayscale natural images and accelerated brain MRI data. The results of all experiments show a similar outcome in line with previous findings for medical images: PSNR and SSIM in the default setting are in the lower range of the result list and HaarPSI outperforms the other tested measures in the overall performance. Also among the top performers in our experiments are the full reference measures FSIM, LPIPS and MS-SSIM. Generally, the results on natural images yield considerably higher correlations, suggesting that additional employment of tailored IQA measures for medical imaging algorithms is needed.
翻译:图像质量评估(IQA)是开发基于图像的新型机器学习算法时的标准实践环节。最常用的IQA指标原本是针对自然图像开发和测试的,并未在医学场景中得到验证。由于医学图像与自然图像具有不同的特性,已报道的医学图像评估不一致现象并不令人意外。本研究通过将常见IQA指标的评估结果与专家手动评分(5位专家对胸部X光片、2位专家对光声图像)进行对比,检验了这些指标在医学图像数据中的适用性。此外,我们还补充研究了灰度自然图像和加速脑部MRI数据。所有实验结果显示出一致结论,与先前医学图像研究结果相符:默认设置下的PSNR和SSIM在结果列表中排名靠后,而HaarPSI在整体性能上优于其他测试指标。在我们的实验中表现优异的指标还包括全参考指标FSIM、LPIPS和MS-SSIM。总体而言,自然图像的评估结果呈现出显著更高的相关性,这表明需要针对医学影像算法额外采用定制化的IQA评估指标。