When a vision model performs image recognition, which visual attributes drive its predictions? Detecting unintended reliance on specific visual features is critical for ensuring model robustness, preventing overfitting, and avoiding spurious correlations. We introduce an automated framework for detecting such dependencies in trained vision models. At the core of our method is a self-reflective agent that systematically generates and tests hypotheses about visual attributes that a model may rely on. This process is iterative: the agent refines its hypotheses based on experimental outcomes and uses a self-evaluation protocol to assess whether its findings accurately explain model behavior. When inconsistencies arise, the agent self-reflects over its findings and triggers a new cycle of experimentation. We evaluate our approach on a novel benchmark of 130 models designed to exhibit diverse visual attribute dependencies across 18 categories. Our results show that the agent's performance consistently improves with self-reflection, with a significant performance increase over non-reflective baselines. We further demonstrate that the agent identifies real-world visual attribute dependencies in state-of-the-art models, including CLIP's vision encoder and the YOLOv8 object detector.
翻译:当视觉模型执行图像识别时,哪些视觉属性驱动了其预测?检测模型对特定视觉特征的意外依赖对于确保模型鲁棒性、防止过拟合以及避免伪相关性至关重要。我们提出了一种自动检测已训练视觉模型中此类依赖关系的框架。我们方法的核心是一个自反思智能体,它系统地生成并测试模型可能依赖的视觉属性假设。该过程是迭代的:智能体根据实验结果精炼其假设,并使用自评估协议来评估其发现是否准确解释了模型行为。当出现不一致时,智能体对其发现进行自我反思,并触发新一轮的实验。我们在一个包含130个模型的新基准上评估了我们的方法,这些模型被设计为在18个类别中表现出多样化的视觉属性依赖。我们的结果表明,智能体的性能随着自我反思而持续提升,相较于非反思基线有显著的性能提高。我们进一步证明,该智能体能够识别最先进模型(包括CLIP的视觉编码器和YOLOv8目标检测器)中真实世界的视觉属性依赖。