In decoder-based LLMs, the representation of a given layer serves two purposes: as input to the next layer during the computation of the current token; and as input to the attention mechanism of future tokens. In this work, we show that the importance of the latter role might be overestimated. To show that, we start by manipulating the representations of previous tokens; e.g. by replacing the hidden states at some layer k with random vectors. Our experimenting with four LLMs and four tasks show that this operation often leads to small to negligible drop in performance. Importantly, this happens if the manipulation occurs in the top part of the model-k is in the final 30-50% of the layers. In contrast, doing the same manipulation in earlier layers might lead to chance level performance. We continue by switching the hidden state of certain tokens with hidden states of other tokens from another prompt; e.g., replacing the word "Italy" with "France" in "What is the capital of Italy?". We find that when applying this switch in the top 1/3 of the model, the model ignores it (answering "Rome"). However if we apply it before, the model conforms to the switch ("Paris"). Our results hint at a two stage process in transformer-based LLMs: the first part gathers input from previous tokens, while the second mainly processes that information internally.
翻译:在基于解码器的大型语言模型中,给定层的表征具有双重作用:既作为下一层在当前标记计算时的输入,又作为未来标记注意力机制的输入。本研究表明,后一作用的重要性可能被高估。为验证此观点,我们首先对先前标记的表征进行干预,例如将第k层的隐藏状态替换为随机向量。通过对四种LLM和四项任务的实验发现,该操作通常仅导致性能轻微下降甚至可忽略不计。值得注意的是,当干预发生在模型顶部(k位于最后30-50%的层)时尤为明显。相反,若在较早层进行相同干预,则可能导致性能降至随机水平。我们进一步将特定标记的隐藏状态与其他提示中的标记隐藏状态互换,例如将"意大利的首都是什么?"中的"意大利"替换为"法国"。实验表明,当交换发生在模型顶部1/3处时,模型会忽略该替换(仍回答"罗马");但若在更早层实施交换,模型则会遵循替换内容(回答"巴黎")。这些结果暗示基于Transformer的LLM存在两阶段处理机制:前半部分负责收集先前标记的输入信息,后半部分则主要进行内部信息处理。