Self-supervised learning (SSL) learns to capture discriminative visual features useful for knowledge transfers. To better accommodate the object-centric nature of current downstream tasks such as object recognition and detection, various methods have been proposed to suppress contextual biases or disentangle objects from contexts. Nevertheless, these methods may prove inadequate in situations where object identity needs to be reasoned from associated context, such as recognizing or inferring tiny or obscured objects. As an initial effort in the SSL literature, we investigate whether and how contextual associations can be enhanced for visual reasoning within SSL regimes, by (a) proposing a new Self-supervised method with external memories for Context Reasoning (SeCo), and (b) introducing two new downstream tasks, lift-the-flap and object priming, addressing the problems of "what" and "where" in context reasoning. In both tasks, SeCo outperformed all state-of-the-art (SOTA) SSL methods by a significant margin. Our network analysis revealed that the proposed external memory in SeCo learns to store prior contextual knowledge, facilitating target identity inference in the lift-the-flap task. Moreover, we conducted psychophysics experiments and introduced a Human benchmark in Object Priming dataset (HOP). Our results demonstrate that SeCo exhibits human-like behaviors.
翻译:自监督学习(SSL)通过捕捉判别性视觉特征来促进知识迁移。为了更好适应当前下游任务(如目标识别与检测)中物体中心化的特性,研究者提出了多种方法用于抑制上下文偏差或从上下文中解耦物体。然而,当需要从关联上下文中推理目标身份时(例如识别或推断微小或遮挡物体),这些方法可能力有不逮。作为SSL领域的初步探索,我们研究了在SSL框架内如何以及是否能够增强上下文关联以实现视觉推理,具体通过:(a)提出一种带有外部记忆的自监督上下文推理方法(SeCo),以及(b)引入两个新下游任务——"掀翻看"任务和目标启动任务,分别解决上下文推理中的"是什么"与"在哪里"问题。在两个任务中,SeCo均以显著优势超越了所有最先进的SSL方法。网络分析表明,SeCo中提出的外部记忆能够学习存储先验上下文知识,从而促进"掀翻看"任务中的目标身份推断。此外,我们开展了心理物理学实验,并在目标启动数据集(HOP)中引入人类基准测试。结果表明,SeCo展现出类似人类的行为模式。