In federated learning (FL), $K$ clients jointly train a model without sharing raw data. Because each participant invests data and compute, clients need mechanisms to later prove the provenance of a jointly trained model. Model watermarking embeds a hidden signal in the weights, but naive approaches either do not scale with many clients as per-client watermarks dilute as $K$ grows, or give any individual client the ability to verify and potentially remove the watermark. We introduce $(t,K)$-threshold watermarking: clients collaboratively embed a shared watermark during training, while only coalitions of at least $t$ clients can reconstruct the watermark key and verify a suspect model. We secret-share the watermark key $τ$ so that coalitions of fewer than $t$ clients cannot reconstruct it, and verification can be performed without revealing $τ$ in the clear. We instantiate our protocol in the white-box setting and evaluate on image classification. Our watermark remains detectable at scale ($K=128$) with minimal accuracy loss and stays above the detection threshold ($z\ge 4$) under attacks including adaptive fine-tuning using up to 20% of the training data.
翻译:在联邦学习(FL)中,$K$个客户端在不共享原始数据的情况下联合训练模型。由于每个参与者都投入了数据和计算资源,客户端需要机制来事后证明联合训练模型的来源。模型水印在权重中嵌入隐藏信号,但朴素方法要么无法随客户端数量扩展(因为每个客户端的水印会随着$K$增大而被稀释),要么赋予任何单个客户端验证并可能移除水印的能力。我们提出$(t,K)$-阈值水印方案:客户端在训练过程中协作嵌入共享水印,而只有至少$t$个客户端组成的联盟才能重构水印密钥并验证可疑模型。我们对水印密钥$τ$进行秘密共享,使得少于$t$个客户端的联盟无法重构该密钥,且验证过程无需明文暴露$τ$。我们在白盒环境中实例化了该协议,并在图像分类任务上进行评估。我们的水印在大规模场景($K=128$)下仍保持可检测性,且精度损失最小;在使用高达20%训练数据的自适应微调等攻击下,检测统计量始终高于检测阈值($z\ge 4$)。