LIT-Former: Linking In-plane and Through-plane Transformers for Simultaneous CT Image Denoising and Deblurring

This paper studies 3D low-dose computed tomography (CT) imaging. Although various deep learning methods were developed in this context, typically they perform denoising due to low-dose and deblurring for super-resolution separately. Up to date, little work was done for simultaneous in-plane denoising and through-plane deblurring, which is important to improve clinical CT images. For this task, a straightforward method is to directly train an end-to-end 3D network. However, it demands much more training data and expensive computational costs. Here, we propose to link in-plane and through-plane transformers for simultaneous in-plane denoising and through-plane deblurring, termed as LIT-Former, which can efficiently synergize in-plane and through-plane sub-tasks for 3D CT imaging and enjoy the advantages of both convolution and transformer networks. LIT-Former has two novel designs: efficient multi-head self-attention modules (eMSM) and efficient convolutional feed-forward networks (eCFN). First, eMSM integrates in-plane 2D self-attention and through-plane 1D self-attention to efficiently capture global interactions of 3D self-attention, the core unit of transformer networks. Second, eCFN integrates 2D convolution and 1D convolution to extract local information of 3D convolution in the same fashion. As a result, the proposed LIT-Former synergizes these two sub-tasks, significantly reducing the computational complexity as compared to 3D counterparts and enabling rapid convergence. Extensive experimental results on simulated and clinical datasets demonstrate superior performance over state-of-the-art models.

翻译：本文研究三维低剂量计算机断层扫描（CT）成像问题。尽管现有多种深度学习方法被应用于该领域，但通常分别针对低剂量导致的噪声去除和超分辨率导致的模糊去除进行单独处理。迄今为止，鲜有工作涉及同时实现面内去噪与穿层去模糊——这一对改善临床CT图像至关重要的任务。针对该问题，直接方法为训练端到端三维网络，但此类方法需大量训练数据且计算成本高昂。本文提出联合面内与穿层Transformer（LIT-Former）以实现面内去噪与穿层去模糊的同步处理。该网络能有效协同三维CT成像中的面内与穿层子任务，并兼具卷积网络与Transformer网络的优势。LIT-Former包含两项创新设计：高效多头自注意力模块（eMSM）与高效卷积前馈网络（eCFN）。首先，eMSM融合面内二维自注意力与穿层一维自注意力，高效捕捉三维自注意力的全局交互——后者为Transformer网络的核心单元。其次，eCFN以相同方式融合二维卷积与一维卷积，提取三维卷积的局部信息。由此，LIT-Former实现两个子任务的协同，相较于三维网络显著降低计算复杂度，并加速收敛。在模拟数据集与临床数据集上的大量实验表明，该模型性能优于现有最优方法。