In-context learning (ICL) in Large Language Models (LLMs) has emerged as a powerful new learning paradigm. However, its underlying mechanism is still not well understood. In particular, it is challenging to map it to the "standard" machine learning framework, where one uses a training set $S$ to find a best-fitting function $f(x)$ in some hypothesis class. Here we make progress on this problem by showing that the functions learned by ICL often have a very simple structure: they correspond to the transformer LLM whose only inputs are the query $x$ and a single "task vector" calculated from the training set. Thus, ICL can be seen as compressing $S$ into a single task vector $\boldsymbol{\theta}(S)$ and then using this task vector to modulate the transformer to produce the output. We support the above claim via comprehensive experiments across a range of models and tasks.
翻译:大语言模型中的上下文学习已成为一种强大的新学习范式,然而其底层机制仍未被充分理解。特别是,将其映射到"标准"机器学习框架(即利用训练集 $S$ 在某个假设类中寻找最优拟合函数 $f(x)$)存在挑战。本文通过证明上下文学习所学的函数通常具有极其简单的结构——即仅包含查询 $x$ 和从训练集计算出的单一"任务向量"的Transformer大语言模型——来推进该问题的研究进展。因此,上下文学习可被理解为将 $S$ 压缩为单一任务向量 $\boldsymbol{\theta}(S)$,再通过该任务向量调制Transformer生成输出。我们通过跨模型和任务的综合实验支持上述观点。