Federated Learning (FL) enables collaborative training of models on decentralized data, but its performance degrades significantly under Non-IID (non-independent and identically distributed) data conditions. While this accuracy loss is well-documented, the internal mechanistic causes remain a black box. This paper investigates the canonical FedAvg algorithm through the lens of Mechanistic Interpretability (MI) to diagnose this failure mode. We hypothesize that the aggregation of conflicting client updates leads to circuit collapse, the destructive interference of functional, sparse sub-networks responsible for specific class predictions. By training inherently interpretable, weight-sparse neural networks within an FL framework, we identify and track these circuits across clients and communication rounds. Using Intersection-over-Union (IoU) to quantify circuit preservation, we provide the first mechanistic evidence that Non-IID data distributions cause structurally distinct local circuits to diverge, leading to their degradation in the global model. Our findings reframe the problem of statistical drift in FL as a concrete, observable failure of mechanistic preservation, paving the way for more targeted solutions.
翻译:联邦学习(FL)使得能够在分散数据上协同训练模型,但在非独立同分布(Non-IID)数据条件下,其性能会显著下降。虽然这种准确率损失已有充分记录,但其内部的机制原因仍是一个黑箱。本文通过机制可解释性(MI)的视角研究经典的FedAvg算法,以诊断这种故障模式。我们假设,冲突的客户端更新的聚合导致了电路崩溃,即负责特定类别预测的功能性、稀疏子网络之间的破坏性干扰。通过在FL框架内训练本质上可解释的、权重稀疏的神经网络,我们识别并跟踪了这些电路在客户端和通信轮次间的变化。使用交并比(IoU)来量化电路保持程度,我们首次提供了机制性证据,表明非独立同分布数据分布会导致结构上不同的局部电路发生分歧,从而使其在全局模型中退化。我们的研究结果将联邦学习中的统计漂移问题重新定义为一种具体的、可观测的机制保持失败,为更有针对性的解决方案铺平了道路。