Medical imaging plays a significant role in detecting and treating various diseases. However, these images often happen to be of too poor quality, leading to decreased efficiency, extra expenses, and even incorrect diagnoses. Therefore, we propose a retinal image enhancement method using a vision transformer and convolutional neural network. It builds a cycle-consistent generative adversarial network that relies on unpaired datasets. It consists of two generators that translate images from one domain to another (e.g., low- to high-quality and vice versa), playing an adversarial game with two discriminators. Generators produce indistinguishable images for discriminators that predict the original images from generated ones. Generators are a combination of vision transformer (ViT) encoder and convolutional neural network (CNN) decoder. Discriminators include traditional CNN encoders. The resulting improved images have been tested quantitatively using such evaluation metrics as peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and qualitatively, i.e., vessel segmentation. The proposed method successfully reduces the adverse effects of blurring, noise, illumination disturbances, and color distortions while significantly preserving structural and color information. Experimental results show the superiority of the proposed method. Our testing PSNR is 31.138 dB for the first and 27.798 dB for the second dataset. Testing SSIM is 0.919 and 0.904, respectively.
翻译:医学影像在多种疾病的检测与治疗中发挥着重要作用。然而,这些图像往往存在质量过低的问题,导致效率降低、成本增加,甚至出现误诊。为此,我们提出一种结合视觉Transformer与卷积神经网络的视网膜图像增强方法。该方法构建了一个基于非配对数据集的循环一致性生成对抗网络,包含两个生成器,用于实现图像在域之间的转换(例如,低质量图像与高质量图像之间的相互转换),并与两个判别器进行对抗博弈。生成器生成的图像对判别器而言难以区分,而判别器则需从生成图像中预测原始图像。生成器由视觉Transformer(ViT)编码器与卷积神经网络(CNN)解码器组合而成,判别器则采用传统CNN编码器。最终生成的增强图像通过峰值信噪比(PSNR)、结构相似性指数(SSIM)等量化指标以及血管分割等定性评估方法进行了测试。该方法在有效保留结构与颜色信息的同时,成功减弱了模糊、噪声、光照干扰及色彩失真等不利影响。实验结果表明了该方法的优越性。在第一个测试数据集上,PSNR达到31.138 dB,第二个数据集上为27.798 dB;对应的SSIM分别为0.919和0.904。