Large language models (LLMs) have demonstrated remarkable performance across various tasks. However, it remains an open question whether the default Euclidean space is the most suitable choice for LLMs. In this study, we investigate the geometric characteristics of LLMs, focusing specifically on tokens and their embeddings. Our findings reveal that token frequency follows a power-law distribution, where high-frequency tokens (e.g., the, that ) constitute the minority, while low-frequency tokens (e.g., apple, dog) constitute the majority. Furthermore, high-frequency tokens cluster near the origin, whereas low-frequency tokens are positioned farther away in the embedding space. Additionally, token embeddings exhibit hyperbolic characteristics, indicating a latent tree-like structure within the embedding space. Motivated by these observations, we propose HypLoRA, an efficient fine-tuning approach that operates in hyperbolic space to exploit these underlying hierarchical structures better. HypLoRA performs low-rank adaptation directly in hyperbolic space, thereby preserving hyperbolic modeling capabilities throughout the fine-tuning process. Extensive experiments across various base models and reasoning benchmarks, specifically arithmetic and commonsense reasoning tasks, demonstrate that HypLoRA substantially improves LLM performance.
翻译:大型语言模型(LLMs)在各种任务中展现出卓越的性能。然而,默认的欧几里得空间是否最适合LLMs仍是一个悬而未决的问题。在本研究中,我们探究了LLMs的几何特性,特别关注词元及其嵌入表示。我们的发现表明,词元频率遵循幂律分布,其中高频词元(例如“the”、“that”)占少数,而低频词元(例如“apple”、“dog”)占多数。此外,高频词元在嵌入空间中聚集于原点附近,而低频词元则分布在更远的位置。同时,词元嵌入表现出双曲特性,表明嵌入空间内存在潜在的树状结构。受这些观察启发,我们提出了HypLoRA,一种在双曲空间中运行的高效微调方法,以更好地利用这些潜在的层次结构。HypLoRA直接在双曲空间中进行低秩自适应,从而在整个微调过程中保持双曲建模能力。在多种基础模型和推理基准(特别是算术推理和常识推理任务)上进行的大量实验表明,HypLoRA显著提升了LLM的性能。