Understanding how large language models (LLMs) represent natural language is a central challenge in natural language processing (NLP) research. Many existing methods extract word embeddings from an LLM, visualise the embedding space via point-plots, and compare the relative positions of certain words. However, this approach only considers single words and not whole natural language expressions, thus disregards the context in which a word is used. Here we present a novel tool for analysing and visualising information flow in natural language expressions by applying diffusion tensor imaging (DTI) to word embeddings. We find that DTI reveals how information flows between word embeddings. Tracking information flows within the layers of an LLM allows for comparing different model structures and revealing opportunities for pruning an LLM's under-utilised layers. Furthermore, our model reveals differences in information flows for tasks like pronoun resolution and metaphor detection. Our results show that our model permits novel insights into how LLMs represent actual natural language expressions, extending the comparison of isolated word embeddings and improving the interpretability of NLP models.
翻译:理解大型语言模型(LLM)如何表征自然语言是自然语言处理(NLP)研究中的一个核心挑战。现有方法大多从LLM中提取词嵌入,通过散点图可视化嵌入空间,并比较特定词语的相对位置。然而,这种方法仅考虑单个词语而非完整的自然语言表达,因而忽略了词语使用的上下文。本文提出一种新颖工具,通过将扩散张量成像(DTI)应用于词嵌入,来分析和可视化自然语言表达中的信息流。我们发现DTI能够揭示信息在词嵌入之间的流动方式。通过追踪LLM各层内的信息流,可以比较不同的模型结构,并揭示对LLM未充分利用层进行剪枝的潜在机会。此外,我们的模型揭示了在代词消解和隐喻检测等任务中信息流的差异。结果表明,该模型为理解LLM如何表征实际自然语言表达提供了新的视角,扩展了孤立词嵌入的比较方法,并提升了NLP模型的可解释性。