Federated Language Model (FedLM) allows a collaborative learning without sharing raw data, yet it introduces a critical vulnerability, as every untrustworthy client may leak the received functional model instance. Current watermarking schemes for FedLM often require white-box access and client-side cooperation, providing only group-level proof of ownership rather than individual traceability. We propose EmbTracker, a server-side, traceable black-box watermarking framework specifically designed for FedLMs. EmbTracker achieves black-box verifiability by embedding a backdoor-based watermark detectable through simple API queries. Client-level traceability is realized by injecting unique identity-specific watermarks into the model distributed to each client. In this way, a leaked model can be attributed to a specific culprit, ensuring robustness even against non-cooperative participants. Extensive experiments on various language and vision-language models demonstrate that EmbTracker achieves robust traceability with verification rates near 100\%, high resilience against removal attacks (fine-tuning, pruning, quantization), and negligible impact on primary task performance (typically within 1-2\%).
翻译:联邦语言模型(FedLM)支持在不共享原始数据的情况下进行协同学习,但其引入了关键的安全漏洞,因为每个不可信的客户端都可能泄露接收到的功能模型实例。当前针对FedLM的水印方案通常需要白盒访问和客户端协作,仅能提供群体层面的所有权证明,而缺乏个体可追溯性。我们提出了EmbTracker,一种专门为FedLM设计的服务器端可追溯黑盒水印框架。EmbTracker通过嵌入基于后门的水印实现黑盒可验证性,该水印可通过简单的API查询进行检测。通过在分发给每个客户端的模型中注入独特的身份特异性水印,实现了客户端级别的可追溯性。通过这种方式,泄露的模型可以被追溯到具体的责任方,即使面对非协作参与者也能确保鲁棒性。在多种语言模型和视觉-语言模型上的大量实验表明,EmbTracker实现了接近100\%验证率的鲁棒可追溯性,对移除攻击(微调、剪枝、量化)具有高抵抗力,且对主要任务性能的影响可忽略不计(通常在1-2\%以内)。