A Gauge Theory of Superposition: Toward a Sheaf-Theoretic Atlas of Neural Representations

We develop a discrete gauge-theoretic framework for superposition in large language models (LLMs) that replaces the single-global-dictionary premise with a sheaf-theoretic atlas of local semantic charts. Contexts are clustered into a stratified context complex; each chart carries a local feature space and a local information-geometric metric (Fisher/Gauss--Newton) identifying predictively consequential feature interactions. This yields a Fisher-weighted interference energy and three measurable obstructions to global interpretability: (O1) local jamming (active load exceeds Fisher bandwidth), (O2) proxy shearing (mismatch between geometric transport and a fixed correspondence proxy), and (O3) nontrivial holonomy (path-dependent transport around loops). We prove and instantiate four results on a frozen open LLM (Llama~3.2~3B Instruct) using WikiText-103, a C4-derived English web-text subset, and \texttt{the-stack-smol}. (A) After constructive gauge fixing on a spanning tree, each chord residual equals the holonomy of its fundamental cycle, making holonomy computable and gauge-invariant. (B) Shearing lower-bounds a data-dependent transfer mismatch energy, turning $D_{\mathrm{shear}}$ into an unavoidable failure bound. (C) We obtain non-vacuous certified jamming/interference bounds with high coverage and zero violations across seeds/hyperparameters. (D) Bootstrap and sample-size experiments show stable estimation of $D_{\mathrm{shear}}$ and $D_{\mathrm{hol}}$, with improved concentration on well-conditioned subsystems.

翻译：我们为大型语言模型（LLM）中的叠加现象构建了一个离散规范理论框架，该框架以层论图册的局部语义图表取代了单一全局词典的前提。语境被聚类为分层的语境复形；每个图表携带一个局部特征空间和一个局部信息几何度量（Fisher/Gauss--Newton），用于识别具有预测意义的特征交互作用。由此产生了一个Fisher加权的干扰能量以及三个可测量的全局可解释性障碍：（O1）局部阻塞（活跃负载超过Fisher带宽），（O2）代理剪切（几何传输与固定对应代理之间的不匹配），以及（O3）非平凡和乐（沿环路路径依赖的传输）。我们在一个冻结的开放LLM（Llama~3.2~3B Instruct）上，使用WikiText-103、一个源自C4的英文网络文本子集以及\texttt{the-stack-smol}，证明并实例化了四个结果。（A）在生成树上进行构造性规范固定后，每条弦残差等于其基本环路的和乐，从而使和乐可计算且规范不变。（B）剪切下界了一个数据依赖的传输失配能量，将$D_{\mathrm{shear}}$转化为一个不可避免的失效界。（C）我们获得了具有高覆盖率和跨种子/超参数零违反的非平凡认证阻塞/干扰界。（D）自助法和样本量实验显示$D_{\mathrm{shear}}$和$D_{\mathrm{hol}}$的估计稳定，且在良条件子系统上具有改进的集中性。