Adaptive Contextual Perception: How to Generalize to New Backgrounds and Ambiguous Objects

Biological vision systems make adaptive use of context to recognize objects in new settings with novel contexts as well as occluded or blurry objects in familiar settings. In this paper, we investigate how vision models adaptively use context for out-of-distribution (OOD) generalization and leverage our analysis results to improve model OOD generalization. First, we formulate two distinct OOD settings where the contexts are either irrelevant (Background-Invariance) or beneficial (Object-Disambiguation), reflecting the diverse contextual challenges faced in biological vision. We then analyze model performance in these two different OOD settings and demonstrate that models that excel in one setting tend to struggle in the other. Notably, prior works on learning causal features improve on one setting but hurt in the other. This underscores the importance of generalizing across both OOD settings, as this ability is crucial for both human cognition and robust AI systems. Next, to better understand the model properties contributing to OOD generalization, we use representational geometry analysis and our own probing methods to examine a population of models, and we discover that those with more factorized representations and appropriate feature weighting are more successful in handling Background-Invariance and Object-Disambiguation tests. We further validate these findings through causal intervention on representation factorization and feature weighting to demonstrate their causal effect on performance. Lastly, we propose new augmentation methods to enhance model generalization. These methods outperform strong baselines, yielding improvements in both in-distribution and OOD tests. In conclusion, to replicate the generalization abilities of biological vision, computer vision models must have factorized object vs. background representations and appropriately weight both kinds of features.

翻译：生物视觉系统能够自适应地利用情境，在具有新背景的新场景中识别物体，同时在熟悉场景中识别被遮挡或模糊的物体。本文研究视觉模型如何自适应地利用情境进行分布外（OOD）泛化，并利用分析结果改进模型的OOD泛化能力。首先，我们提出了两种不同的OOD设置，其中情境要么无关（背景不变性），要么有益（物体消歧），以反映生物视觉面临的不同情境挑战。随后，我们分析模型在这两种OOD设置下的表现，并证明在一种设置下表现优异的模型往往在另一种设置下表现不佳。值得注意的是，先前关于学习因果特征的研究仅能改善一种设置，却损害了另一种设置。这凸显了跨两种OOD设置进行泛化的重要性，因为该能力对人类认知和稳健AI系统均至关重要。接着，为更深入理解有助于OOD泛化的模型属性，我们采用表征几何分析及自研探测方法检查一组模型，并发现那些具有更分解化表征和恰当特征加权的模型，在处理背景不变性和物体消歧测试时更为成功。我们进一步通过因果干预表征分解和特征加权来验证这些发现，证明其对性能的因果影响。最后，我们提出新的数据增强方法以提升模型泛化能力。这些方法在分布内和OOD测试中均优于强基线，取得了显著改进。结论表明：为复现生物视觉的泛化能力，计算机视觉模型必须拥有分解化的物体vs.背景表征，并恰当加权这两类特征。