Although learned representations underlie neural networks' success, their fundamental properties remain poorly understood. A striking example is the emergence of simple geometric structures in LLM representations: for example, calendar months organize into a circle, years form a smooth one-dimensional manifold, and cities' latitudes and longitudes can be decoded by a linear probe. We show that the statistics of language exhibit a translation symmetry -- e.g., the co-occurrence probability of two months depends only on the time interval between them -- and we prove that the latter governs the aforementioned geometric structures in high-dimensional word embedding models. Moreover, we find that these structures persist even when the co-occurrence statistics are strongly perturbed (for example, by removing all sentences in which two months appear together) and at moderate embedding dimension. We show that this robustness naturally emerges if the co-occurrence statistics are collectively controlled by an underlying continuous latent variable. We empirically validate this theoretical framework in word embedding models, text embedding models, and large language models.
翻译:尽管学习到的表示是神经网络成功的基础,但其基本性质仍鲜为人知。一个显著的例子是大型语言模型表示中出现的简单几何结构:例如,日历月份组织成一个圆环,年份形成平滑的一维流形,城市的经纬度可以通过线性探针解码。我们证明语言统计表现出平移对称性——例如,两个月份共现的概率仅取决于它们之间的时间间隔——并且我们证明了后者支配着高维词嵌入模型中的上述几何结构。此外,我们发现即使共现统计受到强烈扰动(例如,通过删除所有两个月份同时出现的句子)且在中等嵌入维度下,这些结构仍然持续存在。我们证明,如果共现统计由潜在的连续隐变量共同控制,这种鲁棒性会自然涌现。我们在词嵌入模型、文本嵌入模型和大型语言模型中实证验证了这一理论框架。