Low-dose computed tomography (LDCT) offers reduced X-ray radiation exposure but at the cost of compromised image quality, characterized by increased noise and artifacts. Recently, transformer models emerged as a promising avenue to enhance LDCT image quality. However, the success of such models relies on a large amount of paired noisy and clean images, which are often scarce in clinical settings. In the fields of computer vision and natural language processing, masked autoencoders (MAE) have been recognized as an effective label-free self-pretraining method for transformers, due to their exceptional feature representation ability. However, the original pretraining and fine-tuning design fails to work in low-level vision tasks like denoising. In response to this challenge, we redesign the classical encoder-decoder learning model and facilitate a simple yet effective low-level vision MAE, referred to as LoMAE, tailored to address the LDCT denoising problem. Moreover, we introduce an MAE-GradCAM method to shed light on the latent learning mechanisms of the MAE/LoMAE. Additionally, we explore the LoMAE's robustness and generability across a variety of noise levels. Experiments results show that the proposed LoMAE can enhance the transformer's denoising performance and greatly relieve the dependence on the ground truth clean data. It also demonstrates remarkable robustness and generalizability over a spectrum of noise levels.
翻译:低剂量计算机断层扫描(LDCT)能在降低X射线辐射暴露的同时,却以图像质量下降为代价,表现为噪声和伪影的增加。近年来,Transformer模型成为提升LDCT图像质量的一个有前景的途径。然而,这类模型的成功依赖于大量配对的含噪与干净图像,而这些数据在临床环境中往往稀缺。在计算机视觉和自然语言处理领域,掩码自编码器(MAE)因其卓越的特征表示能力,已被视为Transformer的一种有效的无标签自预训练方法。然而,原始的预训练与微调设计在去噪等低层视觉任务中失效。针对这一挑战,我们重新设计了经典的编码器-解码器学习模型,并提出了一种简单而有效的低层视觉MAE,称为LoMAE,专门用于解决LDCT去噪问题。此外,我们引入了一种MAE-GradCAM方法,以揭示MAE/LoMAE的潜在学习机制。同时,我们探究了LoMAE在不同噪声水平下的鲁棒性和泛化能力。实验结果表明,所提出的LoMAE能够提升Transformer的去噪性能,并显著减轻对真实干净数据的依赖。它还在各种噪声水平下展现出卓越的鲁棒性和泛化能力。