Though large language models (LLMs) have enabled great success across a wide variety of tasks, they still appear to fall short of one of the loftier goals of artificial intelligence research: creating an artificial system that can adapt its behavior to radically new contexts upon deployment. One important step towards this goal is to create systems that can induce rich representations of data that are seen in-context, and then flexibly deploy these representations to accomplish goals. Recently, Park et al. (2024) demonstrated that current LLMs are indeed capable of inducing such representation from context (i.e., in-context representation learning). The present study investigates whether LLMs can use these representations to complete simple downstream tasks. We first assess whether open-weights LLMs can use in-context representations for next-token prediction, and then probe models using a novel task, adaptive world modeling. In both tasks, we find evidence that open-weights LLMs struggle to deploy representations of novel semantics that are defined in-context, even if they encode these semantics in their latent representations. Furthermore, we assess closed-source, state-of-the-art reasoning models on the adaptive world modeling task, demonstrating that even the most performant LLMs cannot reliably leverage novel patterns presented in-context. Overall, this work seeks to inspire novel methods for encouraging models to not only encode information presented in-context, but to do so in a manner that supports flexible deployment of this information.
翻译:尽管大型语言模型(LLM)已在众多任务中取得显著成功,但其似乎仍未达到人工智能研究的更高目标之一:构建一个能够在部署时适应全新情境的人工系统。实现该目标的关键一步,是创建能够从上下文数据中归纳出丰富表征,并灵活运用这些表征以达成目标的系统。近期,Park等人(2024)的研究表明,当前LLM确实具备从上下文中归纳此类表征(即上下文表征学习)的能力。本研究旨在探究LLM能否利用这些表征完成简单的下游任务。我们首先评估了开源权重LLM能否将上下文表征用于下一词预测,随后通过一项新颖任务——自适应世界建模——对模型进行探测。在这两项任务中,我们发现证据表明,开源权重LLM难以有效运用在上下文中定义的新语义表征,即使其潜在表征中已编码了这些语义。此外,我们在自适应世界建模任务上评估了闭源的尖端推理模型,结果表明即使性能最强的LLM也无法可靠地利用上下文中呈现的新模式。总体而言,本研究旨在启发新方法,以推动模型不仅能够编码上下文信息,更能以支持该信息灵活运用的方式进行编码。