Neural networks have increasingly influenced people's lives. Ensuring the faithful deployment of neural networks as designed by their model owners is crucial, as they may be susceptible to various malicious or unintentional modifications, such as backdooring and poisoning attacks. Fragile model watermarks aim to prevent unexpected tampering that could lead DNN models to make incorrect decisions. They ensure the detection of any tampering with the model as sensitively as possible.However, prior watermarking methods suffered from inefficient sample generation and insufficient sensitivity, limiting their practical applicability. Our approach employs a sample-pairing technique, placing the model boundaries between pairs of samples, while simultaneously maximizing logits. This ensures that the model's decision results of sensitive samples change as much as possible and the Top-1 labels easily alter regardless of the direction it moves.
翻译:神经网络对人们生活的影响日益加深。确保神经网络按照模型所有者的设计被可靠部署至关重要,因为这些模型可能遭受各种恶意或非恶意篡改,例如后门攻击和投毒攻击。脆弱模型水印旨在防止可能导致深度神经网络模型做出错误决策的意外篡改,并能够尽可能灵敏地检测模型的任何篡改行为。然而,以往的水印方法存在样本生成效率低下和灵敏度不足的问题,限制了其实际应用。我们的方法采用样本配对技术,将模型边界置于配对样本之间,同时最大化logits值。这确保了敏感样本的模型决策结果尽可能发生变化,并且无论向哪个方向移动,其Top-1标签都容易改变。