Interpretability methods routinely use population-level summary statistics over observed model behaviour to license claims about the effects of targeted interventions on specific computations; in Pearl's terms, they treat rung-1 associational evidence as if it supported rung-2 interventional conclusions, a move whose validity is rarely tested. We examine one concrete instance: the use of routing statistics in Mixture-of-Experts (MoE) pruning, where utilization rates, activation norms, and routing weight distributions are treated as predictors of which experts can be removed without functional cost. A token-level interventional audit across three high-redundancy MoE architectures (OLMoE-1B-7B-0924, Qwen1.5-MoE-A2.7B, DeepSeek-V2-Lite) finds no observational metric predicts causal expert importance after multiple-comparison correction in any model, with effect sizes below Cohen's $d = 0.17$ across all 60 metric-layer combinations. A per-token routing weight control rules out insufficient power, recovering a single Bonferroni-significant signal at OLMoE's final MoE layer ($d = +0.231$, $p = 0.0013$). Existing pruning methods succeed in this regime not by identifying dispensable experts but because early-layer redundancy renders most selection criteria interchangeable. Our results provide an explicit counterexample to the common inferential step from population-level observational summaries to token-level interventional claims about expert importance, and illustrate how interventional audits can calibrate the evidential standards for interpretability claims.
翻译:可解释性方法通常利用观察到的模型行为在总体层面的统计摘要来推断特定干预对具体计算的影响;用珀尔的术语来说,它们将第一层级的关联性证据视为支持第二层级干预性结论的依据,而这种推断的有效性很少受到检验。我们考察了一个具体案例:混合专家(MoE)剪枝中路由统计量的使用——利用率、激活范数和路由权重分布被当作预测因子,用于判断哪些专家可以在不损失功能的情况下被移除。通过对三种高冗余MoE架构(OLMoE-1B-7B-0924、Qwen1.5-MoE-A2.7B、DeepSeek-V2-Lite)进行词元级干预性审计,我们发现经过多重比较校正后,任何观察性指标均无法在所有模型中预测专家的因果重要性,且在所有60个指标-层组合中效应量均低于Cohen's $d = 0.17$。通过逐词元路由权重控制排除了统计功效不足的可能,仅在OLMoE的最后一个MoE层恢复了一个显著的Bonferroni信号($d = +0.231$,$p = 0.0013$)。现有剪枝方法在此场景下之所以成功,并非因为识别出了可移除的专家,而是由于早期层的冗余性使得大多数选择标准可以互换。我们的研究结果为从总体级观察性摘要到词元级专家重要性干预性推论这一常见推断步骤提供了明确的反例,并展示了干预性审计如何能够校准可解释性主张的证据标准。