Internal representations are crucial for understanding deep neural networks, such as their properties and reasoning patterns, but remain difficult to interpret. While mapping from feature space to input space aids in interpreting the former, existing approaches often rely on crude approximations. We propose using a conditional diffusion model - a pretrained high-fidelity diffusion model conditioned on spatially resolved feature maps - to learn such a mapping in a probabilistic manner. We demonstrate the feasibility of this approach across various pretrained image classifiers from CNNs to ViTs, showing excellent reconstruction capabilities. Through qualitative comparisons and robustness analysis, we validate our method and showcase possible applications, such as the visualization of concept steering in input space or investigations of the composite nature of the feature space. This approach has broad potential for improving feature space understanding in computer vision models.
翻译:内部表征对于理解深度神经网络(如其特性与推理模式)至关重要,但至今仍难以解释。尽管从特征空间映射至输入空间有助于解释前者,现有方法常依赖于粗略近似。我们提出使用条件扩散模型——一种以空间解析特征图为条件的预训练高保真扩散模型——以概率方式学习此类映射。我们在从CNN到ViT的多种预训练图像分类器上验证了该方法的可行性,展示了卓越的重建能力。通过定性比较与鲁棒性分析,我们验证了本方法并展示了其潜在应用,例如输入空间中概念导向的可视化或特征空间复合性质的研究。该方法对于提升计算机视觉模型中特征空间的理解具有广泛潜力。