Large language models (LLMs) trained on huge corpora of text datasets demonstrate intriguing capabilities, achieving state-of-the-art performance on tasks they were not explicitly trained for. The precise nature of LLM capabilities is often mysterious, and different prompts can elicit different capabilities through in-context learning. We propose a framework that enables us to analyze in-context learning dynamics to understand latent concepts underlying LLMs' behavioral patterns. This provides a more nuanced understanding than success-or-failure evaluation benchmarks, but does not require observing internal activations as a mechanistic interpretation of circuits would. Inspired by the cognitive science of human randomness perception, we use random binary sequences as context and study dynamics of in-context learning by manipulating properties of context data, such as sequence length. In the latest GPT-3.5+ models, we find emergent abilities to generate seemingly random numbers and learn basic formal languages, with striking in-context learning dynamics where model outputs transition sharply from seemingly random behaviors to deterministic repetition.
翻译:在海量文本语料库上训练的大型语言模型(LLMs)展现出令人瞩目的能力,能在未经明确训练的任务上达到最先进性能。LLM能力的精确本质往往难以捉摸,不同的提示可通过上下文学习激发不同能力。我们提出一个框架,用以分析上下文学习动态,从而理解LLM行为模式背后的潜在概念。相较于成功/失败评估基准,该方法能提供更细致的理解,且无需像电路机理解释那样观测内部激活状态。受人类随机感知认知科学的启发,我们采用随机二元序列作为上下文,通过调控序列长度等上下文数据属性,研究上下文学习动态。在最新的GPT-3.5+模型中,我们发现模型涌现出生成类随机数及学习基本形式语言的能力,并展现出显著的上下文学习动态——模型输出在类随机行为与确定性重复之间呈现戏剧性转变。