This paper introduces a groundbreaking classification model called the Controllable Ensemble Transformer and CNN (CETC) for the analysis of medical images. The CETC model combines the powerful capabilities of convolutional neural networks (CNNs) and transformers to effectively capture both local and global features present in medical images. The model architecture comprises three main components: a convolutional encoder block (CEB), a transposed-convolutional decoder block (TDB), and a transformer classification block (TCB). The CEB is responsible for capturing multi-local features at different scales and draws upon components from VGGNet, ResNet, and MobileNet as backbones. By leveraging this combination, the CEB is able to effectively detect and encode local features. The TDB, on the other hand, consists of sub-decoders that decode and sum the captured features using ensemble coefficients. This enables the model to efficiently integrate the information from multiple scales. Finally, the TCB utilizes the SwT backbone and a specially designed prediction head to capture global features, ensuring a comprehensive understanding of the entire image. The paper provides detailed information on the experimental setup and implementation, including the use of transfer learning, data preprocessing techniques, and training settings. The CETC model is trained and evaluated using two publicly available COVID-19 datasets. Remarkably, the model outperforms existing state-of-the-art models across various evaluation metrics. The experimental results clearly demonstrate the superiority of the CETC model, emphasizing its potential for accurately and efficiently analyzing medical images.
翻译:本文提出一种创新的医学图像分类模型——可控集成Transformer与CNN(CETC)。该模型融合卷积神经网络(CNN)与Transformer的强大能力,有效捕获医学图像中的局部与全局特征。模型架构包含三个核心组件:卷积编码块(CEB)、转置卷积解码块(TDB)及Transformer分类块(TCB)。CEB通过融合VGGNet、ResNet和MobileNet等骨干网络,实现多尺度局部特征的捕获与编码;TDB由解码器子单元组成,通过集成系数对捕获特征进行解码与累加,高效整合多尺度信息;TCB则利用SwT骨干网络与专用预测头提取全局特征,确保对图像的整体理解。论文详细阐述了实验设置与实现过程,包括迁移学习、数据预处理技术及训练参数配置。基于两个公开COVID-19数据集的训练与评估显示,CETC模型在多项评价指标上均超越现有最优模型。实验结果充分证明CETC模型的优越性,凸显其在医学图像精准高效分析中的潜力。