Analysis of the Two-Step Heterogeneous Transfer Learning for Laryngeal Blood Vessel Classification: Issue and Improvement

Transferring features learned from natural to medical images for classification is common. However, challenges arise due to the scarcity of certain medical image types and the feature disparities between natural and medical images. Two-step transfer learning has been recognized as a promising solution for this issue. However, choosing an appropriate intermediate domain would be critical in further improving the classification performance. In this work, we explore the effectiveness of using color fundus photographs of the diabetic retina dataset as an intermediate domain for two-step heterogeneous learning (THTL) to classify laryngeal vascular images with nine deep-learning models. Experiment results confirm that although the images in both the intermediate and target domains share vascularized characteristics, the accuracy is drastically reduced compared to one-step transfer learning, where only the last layer is fine-tuned (e.g., ResNet18 drops 14.7%, ResNet50 drops 14.8%). By analyzing the Layer Class Activation Maps (LayerCAM), we uncover a novel finding that the prevalent radial vascular pattern in the intermediate domain prevents learning the features of twisted and tangled vessels that distinguish the malignant class in the target domain. To address the performance drop, we propose the Step-Wise Fine-Tuning (SWFT) method on ResNet in the second step of THTL, resulting in substantial accuracy improvements. Compared to THTL's second step, where only the last layer is fine-tuned, accuracy increases by 26.1% for ResNet18 and 20.4% for ResNet50. Additionally, compared to training from scratch, using ImageNet as the source domain could slightly improve classification performance for laryngeal vascular, but the differences are insignificant.

翻译：将自然图像中学到的特征迁移至医学图像进行分类是常见做法。然而，由于某些医学图像类型的稀缺性以及自然图像与医学图像之间的特征差异，这一过程面临挑战。两步迁移学习被认为是解决该问题的有效方案，但选择合适的中间领域对于进一步提升分类性能至关重要。本研究探索了使用糖尿病视网膜彩色眼底照片数据集作为中间领域，结合九种深度学习模型进行两步异构迁移学习（THTL）以分类喉部血管图像的效果。实验结果表明，尽管中间领域与目标领域的图像均具有血管化特征，但与仅微调最后一层的一步迁移学习相比，准确率显著下降（例如ResNet18下降14.7%，ResNet50下降14.8%）。通过分析层类激活映射（LayerCAM），我们揭示了一个新发现：中间领域普遍存在的径向血管模式阻碍了目标领域中区分恶性类别的扭曲缠绕血管特征的学习。针对性能下降问题，我们在THTL第二步中对ResNet提出逐步微调（SWFT）方法，显著提升了准确率。相比THTL第二步仅微调最后一层，ResNet18的准确率提升26.1%，ResNet50提升20.4%。此外，与从头训练相比，使用ImageNet作为源领域可略微改善喉部血管的分类性能，但差异不显著。