Large language models (LLMs) achieve state-of-the-art results across many natural language tasks, but their internal mechanisms remain difficult to interpret. In this work, we extract, process, and visualize latent state geometries in Transformer-based language models through dimensionality reduction. We capture layerwise activations at multiple points within Transformer blocks and enable systematic analysis through Principal Component Analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP). We demonstrate experiments on GPT-2 and LLaMa models, where we uncover interesting geometric patterns in latent space. Notably, we identify a clear separation between attention and MLP component outputs across intermediate layers, a pattern not documented in prior work to our knowledge. We also characterize the high norm of latent states at the initial sequence position and visualize the layerwise evolution of latent states. Additionally, we demonstrate the high-dimensional helical structure of GPT-2's positional embeddings and the sequence-wise geometric patterns in LLaMa. We make our code available at https://github.com/Vainateya/Feature_Geometry_Visualization.
翻译:大语言模型(LLM)在众多自然语言任务中取得了最先进的成果,但其内部机制仍难以解释。本研究通过降维方法提取、处理并可视化基于Transformer的语言模型中的潜在状态几何结构。我们捕获Transformer块内多个位置的分层激活,并通过主成分分析(PCA)和均匀流形近似与投影(UMAP)实现系统性分析。我们在GPT-2和LLaMa模型上进行了实验,揭示了潜在空间中具有启发性的几何模式。值得注意的是,我们首次在中间层发现了注意力机制与多层感知机组件的输出之间存在明显分离,这一模式据我们所知尚未在先前工作中被记录。我们同时刻画了序列起始位置潜在状态的高范数特性,并可视化了潜在状态随层级的演化过程。此外,我们展示了GPT-2位置嵌入的高维螺旋结构以及LLaMa中序列层面的几何模式。相关代码已开源:https://github.com/Vainateya/Feature_Geometry_Visualization。