EmbTracker: Traceable Black-box Watermarking for Federated Language Models

Federated Language Model (FedLM) allows a collaborative learning without sharing raw data, yet it introduces a critical vulnerability, as every untrustworthy client may leak the received functional model instance. Current watermarking schemes for FedLM often require white-box access and client-side cooperation, providing only group-level proof of ownership rather than individual traceability. We propose EmbTracker, a server-side, traceable black-box watermarking framework specifically designed for FedLMs. EmbTracker achieves black-box verifiability by embedding a backdoor-based watermark detectable through simple API queries. Client-level traceability is realized by injecting unique identity-specific watermarks into the model distributed to each client. In this way, a leaked model can be attributed to a specific culprit, ensuring robustness even against non-cooperative participants. Extensive experiments on various language and vision-language models demonstrate that EmbTracker achieves robust traceability with verification rates near 100\%, high resilience against removal attacks (fine-tuning, pruning, quantization), and negligible impact on primary task performance (typically within 1-2\%).

翻译：联邦语言模型（FedLM）支持在不共享原始数据的情况下进行协同学习，但其引入了关键的安全漏洞，因为每个不可信的客户端都可能泄露接收到的功能模型实例。当前针对FedLM的水印方案通常需要白盒访问和客户端协作，仅能提供群体层面的所有权证明，而缺乏个体可追溯性。我们提出了EmbTracker，一种专门为FedLM设计的服务器端可追溯黑盒水印框架。EmbTracker通过嵌入基于后门的水印实现黑盒可验证性，该水印可通过简单的API查询进行检测。通过在分发给每个客户端的模型中注入独特的身份特异性水印，实现了客户端级别的可追溯性。通过这种方式，泄露的模型可以被追溯到具体的责任方，即使面对非协作参与者也能确保鲁棒性。在多种语言模型和视觉-语言模型上的大量实验表明，EmbTracker实现了接近100\%验证率的鲁棒可追溯性，对移除攻击（微调、剪枝、量化）具有高抵抗力，且对主要任务性能的影响可忽略不计（通常在1-2\%以内）。

相关内容

黑盒

关注 1

在科学，计算和工程学中，黑盒是一种设备，系统或对象，可以根据其输入和输出（或传输特性）对其进行查看，而无需对其内部工作有任何了解。它的实现是“不透明的”（黑色）。几乎任何事物都可以被称为黑盒：晶体管，引擎，算法，人脑，机构或政府。为了使用典型的“黑匣子方法”来分析建模为开放系统的事物，仅考虑刺激/响应的行为，以推断（未知）盒子。该黑匣子系统的通常表示形式是在该方框中居中的数据流程图。黑盒的对立面是一个内部组件或逻辑可用于检查的系统，通常将其称为白盒（有时也称为“透明盒”或“玻璃盒”）。

《联邦学习在网络安全中的应用：性能、鲁棒性与对抗性威胁》2025最新145页

专知会员服务

19+阅读 · 2025年9月18日

《联邦军事大语言模型中潜在提示注入攻击的探索与缓解对策》

专知会员服务

15+阅读 · 2025年5月22日

探索联邦军事大型语言模型中的潜在提示注入攻击及其缓解方法

专知会员服务

37+阅读 · 2025年2月4日

联邦学习中的成员推断攻击与防御：综述

专知会员服务

17+阅读 · 2024年12月15日