Would you still call this Dax? Novel Visual References in VLMs and Humans

Vision-language models (VLMs), like human learners, are frequently exposed to new visual concepts, but how they map novel visual references to language after exposure remains largely underexplored, particularly when those references contradict prior knowledge from pre-training. To study this, we present the Novel Visual References Dataset (NVRD): 19,176 images spanning 90 visual concepts across different levels of visual novelty, each with up to 20 increasingly perturbed versions of the original object to probe generalization. Unlike prior work on visual augmentations of familiar concepts, NVRD comprises entirely novel, open-ended stimuli constructed from scratch, mirroring how humans encounter genuinely new concepts. We evaluate 3 open- and 2 closed-source models alongside 2,400 human judgments for direct human-model comparison, and find that (i) models struggle to acquire novel concepts in-context when they contradict prior knowledge, and (ii) while models and humans show correlated sensitivity to visual perturbations, models significantly overgeneralize, extending learned labels to stimuli that humans reject. We contribute NVRD as a corpus and benchmark for research on visual concept learning in both humans and machines.

翻译：视觉-语言模型（VLM）与人类学习者一样，经常接触到新的视觉概念，但它们在接触后如何将新颖的视觉参照映射到语言上，仍然在很大程度上未被探索，特别是当这些参照与预训练中的先验知识相矛盾时。为了研究这一点，我们提出了新颖视觉参照数据集（NVRD）：包含19,176张图像，涵盖90个不同新颖程度的视觉概念，每个概念有原始对象最多20个逐渐扰动的版本，以探查泛化能力。与以往针对熟悉概念的视觉增强研究不同，NVRD完全由从头构建的新颖开放式刺激组成，模拟人类接触真正新概念的方式。我们评估了3个开源和2个闭源模型，并结合2,400个人类判断进行直接的人机比较，发现：（i）当模型与先验知识矛盾时，它们难以在上下文中习得新颖概念；（ii）虽然模型和人类对视觉扰动的敏感性相关，但模型显著过度泛化，将学到的标签扩展到人类拒绝的刺激上。我们贡献NVRD作为人类与机器视觉概念学习研究的语料库和基准。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

[ICML 2026] 看见的还是思考的？用奖励机制区分“看错”与“想错”：视觉语言模型奖励感知

专知会员服务

10+阅读 · 5月15日

面向视觉语言模型的持续学习：遗忘之外的综述与分类体系

专知会员服务

21+阅读 · 2025年8月9日

【ACMMM2025】EvoVLMA: 进化式视觉-语言模型自适应

专知会员服务

15+阅读 · 2025年8月5日

视觉语言模型泛化到新领域：全面综述

专知会员服务

38+阅读 · 2025年6月27日