Dead Directions: Geometric Singular Learning

Singular learning theory and information geometry have studied the same parameter spaces in mostly separate vocabularies: the former computes Bayesian invariants in resolved coordinates, the latter works in original coordinates under a non-degeneracy assumption that overparameterised models routinely violate. We bridge them through one primitive, the dead direction: a unit vector along which the Fisher metric degenerates, equivalently a tangent to the analytic singular set with a definite KL order, set by how fast the KL divergence vanishes. The two readings name the same vector; our central move shows its KL order is recoverable as the decay rate of the directional Fisher curvature approaching the singularity, in original parameter coordinates and without a Hironaka resolution. A selection rule on smooth fibres translates this rate into Watanabe's single-direction contribution to the real log canonical threshold, and we extend the recovery to multi-component crossings, multiplicity $m$, the singular fluctuation $ν$ (universal in the KL order for 1D directions), prior-RLCT shifts, and tempered posteriors. We then lift this rate to a deep network: a multi-layer K-FAC factorisation writes each Fisher block as a product of activation- and gradient-side rates with a duality between them, instantiated at modern-network primitives (residual streams, layer normalisation, attention). A quotient theorem carries the rate to the gauge quotient $Θ/G$ under gradient flow on a $G$-invariant metric; SGD qualifies, standard Adam does not, and we construct a $G$-equivariant Adam-family preconditioner (DDCAdam) that does. The bridge yields a parameter-coordinate handle on singular geometry, closed-form per-architecture predictions, and a trajectory-rate readout of Watanabe's triple $(λ, m, ν)$ from one checkpoint's forward and backward passes, without posterior sampling.

翻译：奇异学习理论与信息几何通常以不同的词汇研究同一参数空间：前者在已解析的坐标下计算贝叶斯不变量，后者则在非退化假设下在原始坐标中工作，而这一假设通常被过参数化模型违反。我们通过一个基本概念“死方向”来连接两者：死方向是单位向量，沿此方向Fisher度量退化，等价于解析奇异集的切线方向，且具有由KL散度消失速度决定的确定KL阶。两种解读指向同一向量；我们的核心操作表明，其KL阶可在原始参数坐标中恢复为逼近奇点时方向Fisher曲率的衰减率，而无需Hironaka解析化。基于光滑纤维上的选择规则，该衰减率转化为Watanabe理论中实对数典型阈值的单方向贡献，并且我们将恢复过程扩展至多分量交叉、重数m、奇异波动ν（在一维方向上对KL阶具有普适性）、先验RLCT偏移以及温度退火后验。随后，我们将该衰减率提升至深度网络：多层K-FAC分解将每个Fisher块写作激活侧与梯度侧速率的乘积，两者之间具有对偶性，这一对偶性体现在现代网络基本组件（残差流、层归一化、注意力机制）中。商定理将该速率传递至在G-不变度量下梯度流作用下的规范商空间Θ/G；SGD满足条件，标准Adam则不满足，而我们在G-等变框架下构造了亚当族预处理器（DDCAdam）以满足条件。该桥梁为奇异几何提供了参数坐标下的处理工具，给出了每种架构的闭式预测，并基于单次检查点的前向与反向传播即可直接读出Watanabe三元组(λ, m, ν)的轨迹速率，而无需后验采样。