高维探针：基于向量符号架构解码大语言模型的表征 (Hyperdimensional Probe: Decoding LLM Representations via Vector Symbolic Architectures)

Despite their capabilities, Large Language Models (LLMs) remain opaque with limited understanding of their internal representations. Current interpretability methods either focus on input-oriented feature extraction, such as supervised probes and Sparse Autoencoders (SAEs), or on output distribution inspection, such as logit-oriented approaches. A full understanding of LLM vector spaces, however, requires integrating both perspectives, something existing approaches struggle with due to constraints on latent feature definitions. We introduce the Hyperdimensional Probe, a hybrid supervised probe that combines symbolic representations with neural probing. Leveraging Vector Symbolic Architectures (VSAs) and hypervector algebra, it unifies prior methods: the top-down interpretability of supervised probes, SAE's sparsity-driven proxy space, and output-oriented logit investigation. This allows deeper input-focused feature extraction while supporting output-oriented investigation. Our experiments show that our method consistently extracts meaningful concepts across LLMs, embedding sizes, and setups, uncovering concept-driven patterns in analogy-oriented inference and QA-focused text generation. By supporting joint input-output analysis, this work advances semantic understanding of neural representations while unifying the complementary perspectives of prior methods.

翻译：尽管大语言模型（LLMs）展现出强大能力，其内部表征仍不透明，理解有限。现有可解释性方法或侧重于输入导向的特征提取（如监督探针与稀疏自编码器），或聚焦于输出分布的检查（如基于对数几率的分析方法）。然而，全面理解LLM的向量空间需要整合这两种视角，而现有方法因潜在特征定义的约束难以实现。我们提出高维探针，一种结合符号表征与神经探测的混合监督探针。该方法利用向量符号架构与高维向量代数，统一了先前方法：监督探针的自顶向下可解释性、稀疏自编码器的稀疏驱动代理空间，以及输出导向的对数几率分析。这实现了更深层次的输入聚焦特征提取，同时支持输出导向的探究。实验表明，我们的方法在不同LLM、嵌入维度和设置下均能稳定提取有意义的概念，揭示了类比推理与问答导向文本生成中的概念驱动模式。通过支持输入-输出联合分析，本研究推进了对神经表征的语义理解，并统一了现有方法的互补视角。