We propose a novel unsupervised object localization method that allows us to explain the predictions of the model by utilizing self-supervised pre-trained models without additional finetuning. Existing unsupervised and self-supervised object localization methods often utilize class-agnostic activation maps or self-similarity maps of a pre-trained model. Although these maps can offer valuable information for localization, their limited ability to explain how the model makes predictions remains challenging. In this paper, we propose a simple yet effective unsupervised object localization method based on representer point selection, where the predictions of the model can be represented as a linear combination of representer values of training points. By selecting representer points, which are the most important examples for the model predictions, our model can provide insights into how the model predicts the foreground object by providing relevant examples as well as their importance. Our method outperforms the state-of-the-art unsupervised and self-supervised object localization methods on various datasets with significant margins and even outperforms recent weakly supervised and few-shot methods.
翻译:我们提出了一种新颖的非监督目标定位方法,该方法通过利用自监督预训练模型,无需额外微调即可解释模型预测。现有的非监督与自监督目标定位方法通常采用预训练模型的类无关激活图或自相似图。尽管这些图能为定位提供有价值的信息,但其解释模型预测机制的能力有限。本文提出一种基于表示点选择的简单高效非监督目标定位方法——模型预测可表示为训练点表示值的线性组合。通过选择对模型预测最关键的表示点(即最具代表性的样本),我们的方法不仅能提供相关示例,还能揭示这些示例的重要性,从而阐明模型预测前景目标的依据。在多个数据集上,该方法以显著优势超越现有最先进的非监督与自监督目标定位方法,甚至优于近期提出的弱监督和少样本方法。