We introduce Prototype Generation, a stricter and more robust form of feature visualisation for model-agnostic, data-independent interpretability of image classification models. We demonstrate its ability to generate inputs that result in natural activation paths, countering previous claims that feature visualisation algorithms are untrustworthy due to the unnatural internal activations. We substantiate these claims by quantitatively measuring similarity between the internal activations of our generated prototypes and natural images. We also demonstrate how the interpretation of generated prototypes yields important insights, highlighting spurious correlations and biases learned by models which quantitative methods over test-sets cannot identify.
翻译:我们提出了原型生成方法,这是一种更严格且更鲁棒的特征可视化形式,用于实现图像分类模型的模型无关、数据无关的可解释性。我们展示了其能够生成具有自然激活路径的输入,从而反驳了先前认为特征可视化算法因非自然内部激活而不可靠的论断。我们通过定量测量生成原型与自然图像内部激活之间的相似性来证实这些主张。我们还展示了生成原型的解释如何产生重要洞见,突出模型习得的虚假相关性和偏差——这些是测试集上的定量方法无法识别的。