Large language models (LLMs) trained on huge corpora of text datasets demonstrate intriguing capabilities, achieving state-of-the-art performance on tasks they were not explicitly trained for. The precise nature of LLM capabilities is often mysterious, and different prompts can elicit different capabilities through in-context learning. We propose a framework that enables us to analyze in-context learning dynamics to understand latent concepts underlying LLMs' behavioral patterns. This provides a more nuanced understanding than success-or-failure evaluation benchmarks, but does not require observing internal activations as a mechanistic interpretation of circuits would. Inspired by the cognitive science of human randomness perception, we use random binary sequences as context and study dynamics of in-context learning by manipulating properties of context data, such as sequence length. In the latest GPT-3.5+ models, we find emergent abilities to generate seemingly random numbers and learn basic formal languages, with striking in-context learning dynamics where model outputs transition sharply from seemingly random behaviors to deterministic repetition.
翻译:大型语言模型(LLMs)在海量文本数据集上训练后,展现出惊人的能力,能够在其未明确训练的任务上达到最先进的性能。LLM能力的精确本质常常令人费解,而不同的提示词可以通过情境学习激发不同的能力。我们提出一个框架,使我们能够分析情境学习动力学,以理解LLM行为模式背后的潜在概念。这提供了比成功-失败评估基准更细致的理解,且无需像电路机制解释那样观察内部激活状态。受人类随机性感知认知科学的启发,我们使用随机二元序列作为上下文,通过操控上下文数据的属性(如序列长度)来研究情境学习动力学。在最新的GPT-3.5+模型中,我们发现模型涌现出生成看似随机数以及学习基本形式语言的能力,并呈现出显著的情境学习动力学:模型输出从看似随机的行为突然转变为确定性重复。