Understanding the latent space of language models (LM) is crucial to refining their performance and interpretability. Existing analyses often fall short in providing disentangled (model-centric) insights into LM semantics, and neglect essential aspects of LM adaption. In response, we introduce a pioneering method called vocabulary-defined semantics, which establishes a reference frame within the LM latent space, ensuring disentangled semantic analysis grounded in LM vocabulary. Our approach transcends prior entangled analysis, leveraging LM vocabulary for model-centric insights. Furthermore, we propose a novel technique to compute logits, emphasising differentiability and local isotropy, and introduce a neural clustering module for semantically calibrating data representations during LM adaptation. Through extensive experiments across diverse text understanding datasets, our approach outperforms state-of-the-art methods of retrieval-augmented generation and parameter-efficient finetuning, showcasing its efficacy and broad applicability. Our findings not only shed light on LM mechanics, but also offer practical solutions to enhance LM performance and interpretability.
翻译:理解语言模型(LM)的隐空间对于优化其性能和可解释性至关重要。现有分析往往在提供解耦的(以模型为中心的)LM语义洞察方面存在不足,并且忽略了LM适配的关键方面。为此,我们提出了一种开创性方法,称为“词汇定义语义学”,该方法在LM隐空间内建立参考框架,确保基于LM词汇的解耦语义分析。我们的方法超越了以往纠缠的分析方式,利用LM词汇获取以模型为中心的洞察。此外,我们提出了一种新颖的logits计算方法,强调可微性和局部各向同性,并引入了一个神经聚类模块,用于在LM适配期间对数据表示进行语义校准。通过在多种文本理解数据集上的广泛实验,我们的方法在检索增强生成和参数高效微调方面超越了当前最先进的方法,展示了其有效性和广泛适用性。我们的研究不仅揭示了LM的运作机制,还提供了增强LM性能和可解释性的实用解决方案。