DSVTLA: Deep Swin Vision Transformer-Based Transfer Learning Architecture for Multi-Type Cancer Histopathological Cancer Image Classification

In this study, we proposed a deep Swin-Vision Transformer-based transfer learning architecture for robust multi-cancer histopathological image classification. The proposed framework integrates a hierarchical Swin Transformer with ResNet50-based convolution features extraction, enabling the model to capture both long-range contextual dependencies and fine-grained local morphological patterns within histopathological images. To validate the efficiency of the proposed architecture, an extensive experiment was executed on a comprehensive multi-cancer dataset including Breast Cancer, Oral Cancer, Lung and Colon Cancer, Kidney Cancer, and Acute Lymphocytic Leukemia (ALL), including both original and segmented images were analyzed to assess model robustness across heterogeneous clinical imaging conditions. Our approach is benchmarked alongside several state-of-the-art CNN and transfer models, including DenseNet121, DenseNet201, InceptionV3, ResNet50, EfficientNetB3, multiple ViT variants, and Swin Transformer models. However, all models were trained and validated using a unified pipeline, incorporating balanced data preprocessing, transfer learning, and fine-tuning strategies. The experimental results demonstrated that our proposed architecture consistently gained superior performance, reaching 100% test accuracy for lung-colon cancer, segmented leukemia datasets, and up to 99.23% accuracy for breast cancer classification. The model also achieved near-perfect precision, f1 score, and recall, indicating highly stable scores across divers cancer types. Overall, the proposed model establishes a highly accurate, interpretable, and also robust multi-cancer classification system, demonstrating strong benchmark for future research and provides a unified comparative assessment useful for designing reliable AI-assisted histopathological diagnosis and clinical decision-making.

翻译：在本研究中，我们提出了一种基于深度Swin-视觉Transformer的迁移学习架构，用于稳健的多癌症组织病理学图像分类。该框架整合了层次化Swin Transformer与基于ResNet50的卷积特征提取，使模型能够同时捕获组织病理学图像中的长距离上下文依赖关系以及细粒度的局部形态特征。为验证所提架构的有效性，我们在包含乳腺癌、口腔癌、肺癌与结肠癌、肾癌以及急性淋巴细胞白血病（ALL）的综合性多癌症数据集上进行了广泛实验，同时分析了原始图像与分割图像，以评估模型在异质性临床成像条件下的稳健性。我们的方法与多种最先进的CNN及迁移模型进行了基准比较，包括DenseNet121、DenseNet201、InceptionV3、ResNet50、EfficientNetB3、多种ViT变体以及Swin Transformer模型。所有模型均采用统一流程进行训练与验证，该流程融合了均衡数据预处理、迁移学习及微调策略。实验结果表明，我们的所提架构持续获得卓越性能：对肺癌-结肠癌及分割后的白血病数据集达到了100%的测试准确率，对乳腺癌分类准确率高达99.23%。该模型还实现了近乎完美的精确率、F1分数和召回率，在不同癌症类型间展现出高度稳定的评分。总体而言，所提模型建立了一个高准确性、可解释且稳健的多癌症分类系统，为未来研究提供了强有力的基准，并提供了统一的比较评估，有助于设计可靠的人工智能辅助组织病理学诊断与临床决策。