Despite the tremendous success of convolutional neural networks (CNNs) in computer vision, the mechanism of CNNs still lacks clear interpretation. Currently, class activation mapping (CAM), a famous visualization technique to interpret CNN's decision, has drawn increasing attention. Gradient-based CAMs are efficient while the performance is heavily affected by gradient vanishing and exploding. In contrast, gradient-free CAMs can avoid computing gradients to produce more understandable results. However, existing gradient-free CAMs are quite time-consuming because hundreds of forward interference per image are required. In this paper, we proposed Cluster-CAM, an effective and efficient gradient-free CNN interpretation algorithm. Cluster-CAM can significantly reduce the times of forward propagation by splitting the feature maps into clusters in an unsupervised manner. Furthermore, we propose an artful strategy to forge a cognition-base map and cognition-scissors from clustered feature maps. The final salience heatmap will be computed by merging the above cognition maps. Qualitative results conspicuously show that Cluster-CAM can produce heatmaps where the highlighted regions match the human's cognition more precisely than existing CAMs. The quantitative evaluation further demonstrates the superiority of Cluster-CAM in both effectiveness and efficiency.
翻译:尽管卷积神经网络在计算机视觉领域取得了巨大成功,但其决策机制仍缺乏清晰解释。当前,类激活映射(CAM)作为一种著名的可视化技术,用于解释CNN的决策已引起广泛关注。基于梯度的CAM方法虽然高效,但性能受梯度消失与梯度爆炸影响严重。相比之下,无梯度CAM可避免梯度计算,从而生成更易理解的结果。然而,现有无梯度CAM方法耗时严重,因其每张图像需要执行数百次前向推理。本文提出Cluster-CAM——一种高效的无梯度CNN解释算法。通过以无监督方式将特征图划分为聚类,Cluster-CAM可显著减少前向传播次数。此外,我们设计了一种巧妙策略,从聚类后的特征图中构建认知基图与认知剪刀。最终的显著性热力图将通过融合上述认知图计算得到。定性结果显著表明,Cluster-CAM生成的热力图中高亮区域比现有CAM方法更精确地匹配人类认知。定量评估进一步证明了Cluster-CAM在有效性与高效性两方面的优越性。