Despite the widespread use of unsupervised models, very few methods are designed to explain them. Most explanation methods explain a scalar model output. However, unsupervised models output representation vectors, the elements of which are not good candidates to explain because they lack semantic meaning. To bridge this gap, recent works defined a scalar explanation output: a dot product-based similarity in the representation space to the sample being explained (i.e., an explicand). Although this enabled explanations of unsupervised models, the interpretation of this approach can still be opaque because similarity to the explicand's representation may not be meaningful to humans. To address this, we propose contrastive corpus similarity, a novel and semantically meaningful scalar explanation output based on a reference corpus and a contrasting foil set of samples. We demonstrate that contrastive corpus similarity is compatible with many post-hoc feature attribution methods to generate COntrastive COrpus Attributions (COCOA) and quantitatively verify that features important to the corpus are identified. We showcase the utility of COCOA in two ways: (i) we draw insights by explaining augmentations of the same image in a contrastive learning setting (SimCLR); and (ii) we perform zero-shot object localization by explaining the similarity of image representations to jointly learned text representations (CLIP).
翻译:尽管无监督模型被广泛使用,但很少有针对其进行解释的方法。大多数解释方法针对标量模型输出进行说明。然而,无监督模型输出的是表示向量,其元素因缺乏语义含义而不适合作为解释对象。为解决这一差距,近期工作定义了一种标量解释输出:被解释样本(即解释对象)在表示空间中基于点积的相似度。尽管这使无监督模型的解释成为可能,但该方法的解读仍可能模糊不清,因为与解释对象表示的相似性对人类可能缺乏意义。为此,我们提出对比性语料相似度——一种基于参考语料和对比干扰样本集的新型语义化标量解释输出。我们证明对比性语料相似度能与多种事后特征归因方法兼容,生成对比性语料归因,并定量验证了与语料相关的重要特征被识别。我们通过两种方式展示对比性语料归因的实用性:(i)在对比学习框架(SimCLR)中解释同一图像的增强操作以获取洞察;(ii)通过解释图像表示与联合学习文本表示(CLIP)的相似性实现零样本目标定位。