Triggerable watermarking enables model owners to assert ownership against model extraction attacks. However, most existing approaches require additional training, which limits post-deployment flexibility, and the lack of clear theoretical foundations makes them vulnerable to adaptive attacks. In this paper, we propose Neural Honeytrace, a plug-and-play watermarking framework that operates without retraining. We redefine the watermark transmission mechanism from an information perspective, designing a training-free multi-step transmission strategy that leverages the long-tailed effect of backdoor learning to achieve efficient and robust watermark embedding. Extensive experiments demonstrate that Neural Honeytrace reduces the average number of queries required for a worst-case t-test-based ownership verification to as low as $2\%$ of existing methods, while incurring zero training cost.
翻译:可触发水印技术使模型所有者能够在面对模型提取攻击时主张所有权。然而,现有方法大多需要进行额外训练,这限制了部署后的灵活性,且由于缺乏清晰的理论基础,它们容易受到自适应攻击。本文提出神经蜜罐追踪,一种无需重新训练的即插即用水印框架。我们从信息角度重新定义了水印传输机制,设计了一种免训练的多步传输策略,该策略利用后门学习的长尾效应实现高效且鲁棒的水印嵌入。大量实验表明,神经蜜罐追踪将基于最坏情况t检验的所有权验证所需平均查询次数降低至现有方法的$2\%$,同时实现零训练成本。