LegoNet: Alternating Model Blocks for Medical Image Segmentation

Ikboljon Sobirov,Cheng Xie,Muhammad Siddique,Parijat Patel,Kenneth Chan,Thomas Halborg,Christos Kotanidis,Zarqiash Fatima,Henry West,Keith Channon,Stefan Neubauer,Charalambos Antoniades,Mohammad Yaqub

from arxiv, 12 pages, 5 figures, 4 tables

Since the emergence of convolutional neural networks (CNNs), and later vision transformers (ViTs), the common paradigm for model development has always been using a set of identical block types with varying parameters/hyper-parameters. To leverage the benefits of different architectural designs (e.g. CNNs and ViTs), we propose to alternate structurally different types of blocks to generate a new architecture, mimicking how Lego blocks can be assembled together. Using two CNN-based and one SwinViT-based blocks, we investigate three variations to the so-called LegoNet that applies the new concept of block alternation for the segmentation task in medical imaging. We also study a new clinical problem which has not been investigated before, namely the right internal mammary artery (RIMA) and perivascular space segmentation from computed tomography angiography (CTA) which has demonstrated a prognostic value to major cardiovascular outcomes. We compare the model performance against popular CNN and ViT architectures using two large datasets (e.g. achieving 0.749 dice similarity coefficient (DSC) on the larger dataset). We evaluate the performance of the model on three external testing cohorts as well, where an expert clinician made corrections to the model segmented results (DSC>0.90 for the three cohorts). To assess our proposed model for suitability in clinical use, we perform intra- and inter-observer variability analysis. Finally, we investigate a joint self-supervised learning approach to assess its impact on model performance. The code and the pretrained model weights will be available upon acceptance.

翻译：自卷积神经网络（CNN）及后续视觉Transformer（ViT）出现以来，模型开发的常见范式始终是使用一组具有不同参数/超参数的相同类型模块。为利用不同架构设计（如CNN和ViT）的优势，我们提出交替使用结构上不同类型的模块来生成新型架构，模仿乐高积木的拼接方式。基于两个CNN模块和一个SwinViT模块，我们研究了名为LegoNet的三种变体，该网络将模块交替这一新概念应用于医学影像分割任务。我们还探讨了一个此前未被研究的临床新问题——从计算机断层扫描血管造影（CTA）中分割右侧胸廓内动脉（RIMA）及其血管周围空间，该区域已被证明对主要心血管结局具有预后价值。我们使用两个大型数据集将模型性能与主流CNN和ViT架构进行比较（例如在较大数据集上达到0.749的Dice相似系数）。此外，我们还在三个外部测试队列上评估模型性能，由临床专家对模型分割结果进行修正（三个队列的DSC>0.90）。为评估所提模型在临床使用中的适用性，我们进行了观察者内与观察者间变异性分析。最后，我们研究了联合自监督学习方法对模型性能的影响。代码与预训练模型权重将在论文接收后公开。