Unauthorized screen-shooting poses a critical data leakage risk. Resisting screen-shooting attacks typically requires high-strength watermark embedding, inevitably degrading the cover image. To resolve the robustness-fidelity conflict, non-intrusive watermarking has emerged as a solution by constructing logical verification keys without altering the original content. However, existing non-intrusive schemes lack the capacity to withstand screen-shooting noise. While deep learning offers a potential remedy, we observe that directly applying it leads to a previously underexplored failure mode, the Structural Shortcut: networks tend to learn trivial identity mappings and neglect the image-watermark binding. Furthermore, even when logical binding is enforced, standard training strategies cannot fully bridge the noise gap, yielding suboptimal robustness against physical distortions. In this paper, we propose NiMark, an end-to-end framework addressing these challenges. First, to eliminate the structural shortcut, we introduce the Sigmoid-Gated XOR (SG-XOR) estimator to enable gradient propagation for the logical operation, effectively enforcing rigid image-watermark binding. Second, to overcome the robustness bottleneck, we devise a two-stage training strategy integrating a restorer to bridge the domain gap caused by screen-shooting noise. Experiments demonstrate that NiMark consistently outperforms representative state-of-the-art methods against both digital attacks and screen-shooting noise, while maintaining zero visual distortion.
翻译:未经授权的屏幕拍摄构成了严重的数据泄露风险。抵抗屏幕拍摄攻击通常需要高强度的水印嵌入,这不可避免地会降低载体图像的质量。为解决鲁棒性与保真度之间的矛盾,非侵入式水印作为一种解决方案应运而生,它通过构建逻辑验证密钥而不改变原始内容。然而,现有的非侵入式方案缺乏抵御屏幕拍摄噪声的能力。尽管深度学习提供了一种潜在的补救措施,但我们观察到直接应用深度学习会导致一种先前未被充分探索的失效模式,即"结构捷径":网络倾向于学习简单的恒等映射而忽视图像-水印的绑定。此外,即使强制实现了逻辑绑定,标准的训练策略也无法完全弥合噪声差距,导致针对物理畸变的鲁棒性欠佳。在本文中,我们提出了NiMark,一个端到端的框架来解决这些挑战。首先,为消除结构捷径,我们引入了Sigmoid门控异或(SG-XOR)估计器,使逻辑操作能够进行梯度传播,从而有效强制执行严格的图像-水印绑定。其次,为克服鲁棒性瓶颈,我们设计了一种两阶段训练策略,集成一个恢复器来弥合由屏幕拍摄噪声引起的域差距。实验表明,NiMark在对抗数字攻击和屏幕拍摄噪声方面始终优于代表性的最先进方法,同时保持零视觉失真。