The linear representation hypothesis states that language models (LMs) encode concepts as directions in their latent space, forming organized, multidimensional manifolds. Prior work has largely focused on identifying specific geometries for individual features, limiting its ability to generalize. We introduce Supervised Multi-Dimensional Scaling (SMDS), a model-agnostic method for evaluating and comparing competing feature manifold hypotheses. We apply SMDS to temporal reasoning as a case study and find that different features instantiate distinct geometric structures, including circles, lines, and clusters. SMDS reveals several consistent characteristics of these structures: they reflect the semantic properties of the concepts they represent, remain stable across model families and sizes, actively support reasoning, and dynamically reshape in response to contextual changes. Together, our findings shed light on the functional role of feature manifolds, supporting a model of entity-based reasoning in which LMs encode and transform structured representations.
翻译:线性表示假说认为,语言模型在其潜在空间中将概念编码为方向,形成有组织的多维流形。现有研究主要聚焦于识别单个特征的具体几何结构,这使得其泛化能力受限。我们提出了监督多维缩放方法——一种与模型无关的方法,用于评估和比较竞争性的特征流形假说。以时间推理为例应用SMDS后发现,不同特征会实例化出包括圆、直线和簇在内的不同几何结构。SMDS揭示了这些结构的若干一致特性:它们反映了所表征概念的语义属性,在不同模型系列和规模间保持稳定,主动支持推理过程,并随语境变化而动态重塑。综合而言,我们的发现揭示了特征流形的功能角色,为语言模型编码和转换结构化表示的实体推理模型提供了支持。