WiTUnet: A U-Shaped Architecture Integrating CNN and Transformer for Improved Feature Alignment and Local Information Fusion

Low-dose computed tomography (LDCT) has become the technology of choice for diagnostic medical imaging, given its lower radiation dose compared to standard CT, despite increasing image noise and potentially affecting diagnostic accuracy. To address this, advanced deep learning-based LDCT denoising algorithms have been developed, primarily using Convolutional Neural Networks (CNNs) or Transformer Networks with the Unet architecture. This architecture enhances image detail by integrating feature maps from the encoder and decoder via skip connections. However, current methods often overlook enhancements to the Unet architecture itself, focusing instead on optimizing encoder and decoder structures. This approach can be problematic due to the significant differences in feature map characteristics between the encoder and decoder, where simple fusion strategies may not effectively reconstruct images.In this paper, we introduce WiTUnet, a novel LDCT image denoising method that utilizes nested, dense skip pathways instead of traditional skip connections to improve feature integration. WiTUnet also incorporates a windowed Transformer structure to process images in smaller, non-overlapping segments, reducing computational load. Additionally, the integration of a Local Image Perception Enhancement (LiPe) module in both the encoder and decoder replaces the standard multi-layer perceptron (MLP) in Transformers, enhancing local feature capture and representation. Through extensive experimental comparisons, WiTUnet has demonstrated superior performance over existing methods in key metrics such as Peak Signal-to-Noise Ratio (PSNR), Structural Similarity (SSIM), and Root Mean Square Error (RMSE), significantly improving noise removal and image quality.

翻译：低剂量计算机断层扫描（LDCT）因其相比标准CT更低的辐射剂量，已成为诊断医学成像的首选技术，尽管其图像噪声增加并可能影响诊断准确性。为解决这一问题，研究人员开发了基于先进深度学习的LDCT去噪算法，主要采用卷积神经网络（CNN）或Transformer网络结合Unet架构。该架构通过跳跃连接整合编码器和解码器的特征图，从而增强图像细节。然而，现有方法通常忽略对Unet架构本身的改进，转而专注于优化编码器和解码器结构。由于编码器和解码器之间的特征图特性存在显著差异，这种策略可能存在问题——简单的融合策略往往无法有效重建图像。本文提出WiTUnet，一种新型LDCT图像去噪方法，该方法使用嵌套密集跳跃路径替代传统跳跃连接以改进特征融合。WiTUnet还引入窗口化Transformer结构，以非重叠的小块处理图像，从而降低计算负担。此外，在编码器和解码器中集成局部图像感知增强（LiPe）模块，替代Transformer中的标准多层感知机（MLP），增强了局部特征捕获与表征能力。通过广泛的实验对比，WiTUnet在峰值信噪比（PSNR）、结构相似性（SSIM）和均方根误差（RMSE）等关键指标上展现出优于现有方法的性能，显著提升了噪声去除效果与图像质量。