Deep learning opacity often impedes deployment in high-stakes domains. We propose a training framework that aligns model focus with class-representative features without requiring pixel-level annotations. To this end, we introduce Batch-CAM, a vectorised implementation of Gradient-weighted Class Activation Mapping that integrates directly into the training loop with minimal computational overhead. We propose two regularisation terms: a Prototype Loss, which aligns individual-sample attention with the global class average, and a Batch-CAM Loss, which enforces consistency within a training batch. These are evaluated using L1, L2, and SSIM metrics. Validated on MNIST and Fashion-MNIST using ResNet18 and ConvNeXt-V2, our method generates significantly more coherent and human-interpretable saliency maps compared to baselines. While maintaining competitive classification accuracy, the framework successfully suppresses spurious feature activation, as evidenced by qualitative reconstruction analysis. Batch-CAM appears to offer a scalable pathway for training intrinsically interpretable models by leveraging batch-level statistics to guide feature extraction, effectively bridging the gap between predictive performance and explainability.
翻译:深度学习的不透明性常常阻碍其在高风险领域的部署。我们提出一种训练框架,该框架能够将模型关注点与类别代表性特征对齐,且无需像素级标注。为此,我们引入了Batch-CAM,这是一种向量化的梯度加权类激活映射实现,可直接集成到训练循环中,计算开销极小。我们提出了两种正则化项:原型损失,用于将单个样本的注意力与全局类别平均值对齐;以及Batch-CAM损失,用于强制训练批次内的一致性。这些项使用L1、L2和SSIM指标进行评估。在MNIST和Fashion-MNIST数据集上使用ResNet18和ConvNeXt-V2模型进行验证,结果表明,与基线方法相比,我们的方法能生成显著更连贯且更易于人类理解的显著性图。在保持具有竞争力的分类精度的同时,该框架成功地抑制了虚假特征激活,定性重建分析证实了这一点。Batch-CAM似乎为训练本质可解释模型提供了一条可扩展的途径,其通过利用批次级统计信息来指导特征提取,有效地弥合了预测性能与可解释性之间的差距。