Large language models (LLMs) can solve complex multi-step problems, but little is known about how these computations are implemented internally. Motivated by this, we study how LLMs answer multi-hop queries such as "The spouse of the performer of Imagine is". These queries require two information extraction steps: a latent one for resolving the first hop ("the performer of Imagine") into the bridge entity (John Lennon), and one for resolving the second hop ("the spouse of John Lennon") into the target entity (Yoko Ono). Understanding how the latent step is computed internally is key to understanding the overall computation. By carefully analyzing the internal computations of transformer-based LLMs, we discover that the bridge entity is resolved in the early layers of the model. Then, only after this resolution, the two-hop query is solved in the later layers. Because the second hop commences in later layers, there could be cases where these layers no longer encode the necessary knowledge for correctly predicting the answer. Motivated by this, we propose a novel "back-patching" analysis method whereby a hidden representation from a later layer is patched back to an earlier layer. We find that in up to 57% of previously incorrect cases there exists a back-patch that results in the correct generation of the answer, showing that the later layers indeed sometimes lack the needed functionality. Overall our methods and findings open further opportunities for understanding and improving latent reasoning in transformer-based LLMs.
翻译:大语言模型(LLMs)能够解决复杂的多步骤问题,但其内部如何实现这些计算过程尚不明确。受此启发,我们研究LLMs如何回答多跳查询,例如“Imagine演唱者的配偶是”。此类查询需要两个信息提取步骤:一个潜在步骤用于将第一跳(“Imagine的演唱者”)解析为桥接实体(约翰·列侬),另一个步骤用于将第二跳(“约翰·列侬的配偶”)解析为目标实体(小野洋子)。理解潜在步骤在内部如何计算是理解整体计算的关键。通过对基于Transformer的LLMs内部计算进行细致分析,我们发现桥接实体在模型的浅层即被解析。随后,仅在此解析完成后,多跳查询在深层得到解决。由于第二跳在深层才开始处理,可能存在这些层已不再编码正确预测答案所需知识的情况。基于此,我们提出一种新颖的“反向修补”分析方法,将深层隐藏表示回补至浅层。研究发现,在高达57%先前错误的案例中,存在能导致答案正确生成的反向修补方案,这表明深层网络确实有时缺乏必要的功能。总体而言,我们的方法与发现为理解和改进基于Transformer的LLMs中的潜在推理能力开辟了新的研究路径。