The ability to learn from context with novel concepts, and deliver appropriate responses are essential in human conversations. Despite current Multimodal Large Language Models (MLLMs) and Large Language Models (LLMs) being trained on mega-scale datasets, recognizing unseen images or understanding novel concepts in a training-free manner remains a challenge. In-Context Learning (ICL) explores training-free few-shot learning, where models are encouraged to ``learn to learn" from limited tasks and generalize to unseen tasks. In this work, we propose link-context learning (LCL), which emphasizes "reasoning from cause and effect" to augment the learning capabilities of MLLMs. LCL goes beyond traditional ICL by explicitly strengthening the causal relationship between the support set and the query set. By providing demonstrations with causal links, LCL guides the model to discern not only the analogy but also the underlying causal associations between data points, which empowers MLLMs to recognize unseen images and understand novel concepts more effectively. To facilitate the evaluation of this novel approach, we introduce the ISEKAI dataset, comprising exclusively of unseen generated image-label pairs designed for link-context learning. Extensive experiments show that our LCL-MLLM exhibits strong link-context learning capabilities to novel concepts over vanilla MLLMs. Code and data will be released at https://github.com/isekai-portal/Link-Context-Learning.
翻译:从上下文学习新概念并做出恰当响应的能力,是人类对话中的关键要素。尽管当前的多模态大语言模型(MLLMs)和大语言模型(LLMs)在海量数据集上进行了训练,但在无需训练的情况下识别未见图像或理解新概念仍是一大挑战。上下文学习(ICL)探索了无需训练的小样本学习,鼓励模型从有限任务中“学习如何学习”,并泛化到未见任务。本文提出了链接上下文学习(LCL),强调“因果推理”以增强MLLMs的学习能力。LCL通过显式强化支持集与查询集之间的因果关系,超越了传统的ICL。通过提供带有因果链接的示例,LCL引导模型不仅识别类比关系,还能把握数据点之间的潜在因果关联,从而更有效地赋能MLLMs识别未见图像和理解新概念。为促进这一新方法的评估,我们引入了ISEKAI数据集,它包含专为链接上下文学习设计的未见生成图像-标签对。大量实验表明,我们的LCL-MLLM在应对新概念时,相较于普通MLLMs展现出更强的链接上下文学习能力。代码与数据将在https://github.com/isekai-portal/Link-Context-Learning 发布。