Federated Learning (FL) enables collaborative training of Large Language Models (LLMs) across distributed data sources while preserving privacy. However, when federated LLMs are deployed in critical applications, it remains unclear which client(s) contributed to specific generated responses, hindering debugging, malicious client identification, fair reward allocation, and trust verification. We present ProToken, a novel Provenance methodology for Token-level attribution in federated LLMs that addresses client attribution during autoregressive text generation while maintaining FL privacy constraints. ProToken leverages two key insights to enable provenance at each token: (1) transformer architectures concentrate task-specific signals in later blocks, enabling strategic layer selection for computational tractability, and (2) gradient-based relevance weighting filters out irrelevant neural activations, focusing attribution on neurons that directly influence token generation. We evaluate ProToken across 16 configurations spanning four LLM architectures (Gemma, Llama, Qwen, SmolLM) and four domains (medical, financial, mathematical, coding). ProToken achieves 98% average attribution accuracy in correctly localizing responsible client(s), and maintains high accuracy when the number of clients are scaled, validating its practical viability for real-world deployment settings.
翻译:联邦学习(FL)使得能够跨分布式数据源协同训练大语言模型(LLM),同时保护隐私。然而,当联邦LLM部署于关键应用时,仍不清楚哪些客户端对特定生成响应做出了贡献,这阻碍了调试、恶意客户端识别、公平奖励分配和信任验证。本文提出ProToken,一种用于联邦LLM中词元级溯源的新型起源追溯方法,可在自回归文本生成过程中解决客户端溯源问题,同时保持联邦学习的隐私约束。ProToken利用两个关键见解实现每个词元的溯源:(1)Transformer架构将任务特定信号集中在后部块中,从而能够通过策略性层选择实现计算可行性;(2)基于梯度的相关性加权过滤掉不相关的神经激活,将溯源聚焦于直接影响词元生成的神经元。我们在涵盖四种LLM架构(Gemma、Llama、Qwen、SmolLM)和四个领域(医疗、金融、数学、编程)的16种配置中评估ProToken。ProToken在准确定位责任客户端方面达到了98%的平均溯源准确率,并在客户端数量扩展时保持高准确率,验证了其在实际部署场景中的可行性。