The final MLP of GPT-2 Small exhibits a fully legible routing program -- 27 named neurons organized into a three-tier exception handler -- while the knowledge it routes remains entangled across ~3,040 residual neurons. We decompose all 3,072 neurons (to numerical precision) into: 5 fused Core neurons that reset vocabulary toward function words, 10 Differentiators that suppress wrong candidates, 5 Specialists that detect structural boundaries, and 7 Consensus neurons that each monitor a distinct linguistic dimension. The consensus-exception crossover -- where MLP intervention shifts from helpful to harmful -- is statistically sharp (bootstrap 95% CIs exclude zero at all consensus levels; crossover between 4/7 and 5/7). Three experiments show that "knowledge neurons" (Dai et al., 2022), at L11 of this model, function as routing infrastructure rather than fact storage: the MLP amplifies or suppresses signals already present in the residual stream from attention, scaling with contextual constraint. A garden-path experiment reveals a reversed garden-path effect -- GPT-2 uses verb subcategorization immediately, consistent with the exception handler operating at token-level predictability rather than syntactic structure. This architecture crystallizes only at the terminal layer -- in deeper models, we predict equivalent structure at the final layer, not at layer 11. Code and data: https://github.com/pbalogh/transparent-gpt2
翻译:GPT-2 Small的最终MLP层展现了一个完全可读的路由程序——27个命名神经元组织成三级异常处理器——而其路由的知识仍纠缠在约3040个残差神经元中。我们将全部3072个神经元(达到数值精度)分解为:5个融合核心神经元(将词汇重置为功能词)、10个区分器(抑制错误候选)、5个专家(检测结构边界)和7个共识神经元(各自监测不同语言维度)。共识-异常交叉点(即MLP干预从有益转为有害的临界点)在统计上显著(自助法95%置信区间在所有共识水平上排除零;交叉点位于4/7与5/7之间)。三项实验表明,该模型L11层的"知识神经元"(Dai等,2022)作为路由基础设施而非事实存储发挥作用:MLP放大或抑制残差流中来自注意力机制的已有信号,其规模随上下文约束变化。花园路径实验揭示了反向花园路径效应——GPT-2立即使用动词子范畴化规则,与异常处理器基于词级可预测性而非句法结构运作的假说一致。该架构仅在最末层结晶——在更深的模型中,我们预测等效结构出现在最终层而非第11层。代码与数据:https://github.com/pbalogh/transparent-gpt2