Large Language Models like ChatGPT demonstrate a remarkable capacity to learn new concepts during inference without any fine-tuning. However, visual models trained to detect new objects during inference have been unable to replicate this ability, and instead either perform poorly or require meta-training and/or fine-tuning on similar objects. In this work, we propose a meta-learning algorithm that emulates Large Language Models by learning new visual concepts during inference without fine-tuning. Our approach leverages a frozen pre-trained feature extractor, and analogous to in-context learning, recasts meta-learning as sequence modeling over datapoints with known labels and a test datapoint with an unknown label. On 8 out of 11 meta-learning benchmarks, our approach -- without meta-training or fine-tuning -- exceeds or matches the state-of-the-art algorithm, P>M>F, which is meta-trained on these benchmarks.
翻译:像ChatGPT这样的大型语言模型展现出在推理过程中无需微调即可学习新概念的卓越能力。然而,经过训练能在推理过程中检测新目标的视觉模型却无法复制这一能力,这些模型要么表现不佳,要么需要对类似目标进行元训练和/或微调。在本研究中,我们提出一种模拟大型语言模型的元学习算法,该算法能在推理过程中无需微调即可学习新的视觉概念。我们的方法利用冻结的预训练特征提取器,类似于上下文学习,将元学习重新定义为对带有已知标签的数据点和一个带有未知标签的测试数据点进行序列建模。在11个元学习基准测试中的8个上,我们的方法——无需元训练或微调——超越或媲美了在这些基准测试上进行过元训练的最新算法P>M>F。