SwinVFTR: A Novel Volumetric Feature-learning Transformer for 3D OCT Fluid Segmentation

Accurately segmenting fluid in 3D volumetric optical coherence tomography (OCT) images is a crucial yet challenging task for detecting eye diseases. Traditional autoencoding-based segmentation approaches have limitations in extracting fluid regions due to successive resolution loss in the encoding phase and the inability to recover lost information in the decoding phase. Although current transformer-based models for medical image segmentation addresses this limitation, they are not designed to be applied out-of-the-box for 3D OCT volumes, which have a wide-ranging channel-axis size based on different vendor device and extraction technique. To address these issues, we propose SwinVFTR, a new transformer-based architecture designed for precise fluid segmentation in 3D volumetric OCT images. We first utilize a channel-wise volumetric sampling for training on OCT volumes with varying depths (B-scans). Next, the model uses a novel shifted window transformer block in the encoder to achieve better localization and segmentation of fluid regions. Additionally, we propose a new volumetric attention block for spatial and depth-wise attention, which improves upon traditional residual skip connections. Consequently, utilizing multi-class dice loss, the proposed architecture outperforms other existing architectures on the three publicly available vendor-specific OCT datasets, namely Spectralis, Cirrus, and Topcon, with mean dice scores of 0.72, 0.59, and 0.68, respectively. Additionally, SwinVFTR outperforms other architectures in two additional relevant metrics, mean intersection-over-union (Mean-IOU) and structural similarity measure (SSIM).

翻译：准确分割三维容积光学相干断层扫描（OCT）图像中的积液是检测眼部疾病的关键且具有挑战性的任务。传统的基于自编码器的分割方法因编码阶段连续分辨率损失以及解码阶段无法恢复丢失信息，在提取积液区域方面存在局限性。尽管当前基于Transformer的医学图像分割模型解决了这一局限，但它们并未设计为可直接应用于三维OCT容积数据——该类数据因不同厂商设备和提取技术而具有宽泛的通道轴尺寸。为解决上述问题，我们提出SwinVFTR，一种专为精准分割三维容积OCT图像中积液而设计的新型Transformer架构。首先，我们采用逐通道容积采样方法，对具有不同深度（B扫描）的OCT体数据进行训练。随后，模型在编码器中利用新型移位窗口Transformer模块，实现积液区域的更优定位与分割。此外，我们提出一种新型容积注意力模块，用于空间和深度维度的注意力机制，改进了传统的残差跳跃连接。最终，结合多类Dice损失函数，所提架构在Spectralis、Cirrus和Topcon三种公开厂商专用OCT数据集上均优于现有架构，平均Dice系数分别达到0.72、0.59和0.68。同时，SwinVFTR在平均交并比（Mean-IOU）和结构相似性度量（SSIM）两项相关指标上亦超越其他架构。