Swin-Tempo: Temporal-Aware Lung Nodule Detection in CT Scans as Video Sequences Using Swin Transformer-Enhanced UNet

Lung cancer is highly lethal, emphasizing the critical need for early detection. However, identifying lung nodules poses significant challenges for radiologists, who rely heavily on their expertise and experience for accurate diagnosis. To address this issue, computer-aided diagnosis systems based on machine learning techniques have emerged to assist doctors in identifying lung nodules from computed tomography (CT) scans. Unfortunately, existing networks in this domain often suffer from computational complexity, leading to high rates of false negatives and false positives, limiting their effectiveness. To address these challenges, we present an innovative model that harnesses the strengths of both convolutional neural networks and vision transformers. Inspired by object detection in videos, we treat each 3D CT image as a video, individual slices as frames, and lung nodules as objects, enabling a time-series application. The primary objective of our work is to overcome hardware limitations during model training, allowing for efficient processing of 2D data while utilizing inter-slice information for accurate identification based on 3D image context. We validated the proposed network by applying a 10-fold cross-validation technique to the publicly available Lung Nodule Analysis 2016 dataset. Our proposed architecture achieves an average sensitivity criterion of 97.84% and a competition performance metrics (CPM) of 96.0% with few parameters. Comparative analysis with state-of-the-art advancements in lung nodule identification demonstrates the significant accuracy achieved by our proposed model.

翻译：肺癌具有高度致命性，早期检测至关重要。然而，放射科医生在识别肺结节时面临重大挑战，其诊断准确性高度依赖专业知识和临床经验。为解决这一问题，基于机器学习技术的计算机辅助诊断系统应运而生，辅助医生从计算机断层扫描（CT）图像中识别肺结节。遗憾的是，该领域现有网络常存在计算复杂度过高的问题，导致假阴性和假阳性率居高不下，限制了其实用效能。为应对这些挑战，我们提出了一种创新模型，融合了卷积神经网络与视觉Transformer的各自优势。受视频目标检测启发，我们将三维CT图像视为视频序列，各切片作为视频帧，肺结节视为检测目标，从而构建时序分析框架。本研究核心目标在于突破模型训练时的硬件限制，在高效处理二维数据的同时，利用层间信息实现基于三维影像背景的精准识别。我们采用公开的Lung Nodule Analysis 2016数据集，通过十折交叉验证对所提网络进行验证。该架构以较少的参数实现了平均灵敏度97.84%和竞赛性能指标（CPM）96.0%的优异表现。与当前最先进的肺结节识别方法进行对比分析，结果表明本模型在检测精度方面具有显著优势。