We propose a distributional theory of how hypernymy -- the ``is-a'' relation between general and specific concepts -- is encoded geometrically in language representations. Starting from the empirically verified assumption that words closer on the WordNet hypernym graph co-occur more often, we characterize theoretically the spectrum of the resulting embedding Gram matrix of word2vec embeddings. Under mild positivity and decay conditions on the co-occurrence kernel, we prove that the leading eigenvectors first separate broad taxonomic branches and then progressively finer sub-branches, producing a \emph{hierarchical splitting geometry} with a coarse-to-fine spectral organization that mirrors the tree. We confirm these predictions in word2vec embeddings across many sampled WordNet subtrees, and show that the same signature extends strikingly well to Gemma 2B unembeddings. Our results indicate that hierarchical concept geometry in LLMs need not reflect a hierarchy-specific functional mechanism, but emerges from the spectral structure of pairwise word statistics.
翻译:我们提出一种分布性理论,解释上位关系(即一般概念与具体概念之间的“是一种”关系)如何在语言表示中通过几何方式编码。基于已验证的假设——在WordNet上位词图中距离更近的词语共现频率更高——我们从理论上刻画了由此产生的word2vec嵌入格拉姆矩阵的谱分布。在共现核函数满足适度正性及衰减条件下,我们证明:前导特征向量首先分离出广泛的分类枝干,随后逐步分离出更细致的子枝干,从而形成一种从粗到细的谱组织结构,即“层次化分裂几何”,其结构与概念树形图相呼应。我们在多个采样的WordNet子树上的word2vec嵌入中验证了这些预测,并表明相同的谱特征显著地延伸至Gemma 2B模型的反嵌入层。我们的结果表明,大语言模型中的层次化概念几何不一定反映特定于层次关系的功能机制,而是源自于词语间共现统计的谱结构。