Transfer learning followed by fine-tuning is widely adopted in medical image classification due to consistent gains in diagnostic performance. However, in multi-class settings with overlapping visual features, improvements in accuracy do not guarantee stability of the visual evidence used to support predictions. We define semantic drift as systematic changes in the attribution structure supporting a model's predictions between transfer learning and full fine-tuning, reflecting potential shifts in underlying visual reasoning despite stable classification performance. Using a five-class chest X-ray task, we evaluate DenseNet201, ResNet50V2, and InceptionV3 under a two-stage training protocol and quantify drift with reference-free metrics capturing spatial localization and structural consistency of attribution maps. Across architectures, coarse anatomical localization remains stable, while overlap IoU reveals pronounced architecture-dependent reorganization of evidential structure. Beyond single-method analysis, stability rankings can reverse across LayerCAM and GradCAM++ under converged predictive performance, establishing explanation stability as an interaction between architecture, optimization phase, and attribution objective.
翻译:迁移学习后接微调因在诊断性能上的一致提升而广泛应用于医学图像分类。然而,在具有重叠视觉特征的多类别场景中,准确率的提升并不能保证用于支持预测的视觉证据的稳定性。我们定义语义漂移为在迁移学习与完全微调之间,模型预测所依据的归因结构发生的系统性变化,这反映了在稳定分类性能下潜在视觉推理的偏移。利用一个五分类胸部X光任务,我们在两阶段训练协议下评估了DenseNet201、ResNet50V2和InceptionV3,并通过无参考指标量化漂移,这些指标捕捉归因图的空间定位与结构一致性。在不同架构间,粗略解剖定位保持稳定,而重叠IoU揭示了显著的架构依赖的证据结构重组。超越单一方法分析,在收敛的预测性能下,稳定性排序可能在LayerCAM和GradCAM++之间发生逆转,从而确立了解释稳定性作为架构、优化阶段与归因目标之间相互作用的观点。