Artificial Intelligence (AI) has found wide application, but also poses risks due to unintentional or malicious tampering during deployment. Regular checks are therefore necessary to detect and prevent such risks. Fragile watermarking is a technique used to identify tampering in AI models. However, previous methods have faced challenges including risks of omission, additional information transmission, and inability to locate tampering precisely. In this paper, we propose a method for detecting tampered parameters and bits, which can be used to detect, locate, and restore parameters that have been tampered with. We also propose an adaptive embedding method that maximizes information capacity while maintaining model accuracy. Our approach was tested on multiple neural networks subjected to attacks that modified weight parameters, and our results demonstrate that our method achieved great recovery performance when the modification rate was below 20%. Furthermore, for models where watermarking significantly affected accuracy, we utilized an adaptive bit technique to recover more than 15% of the accuracy loss of the model.
翻译:人工智能(AI)已得到广泛应用,但在部署过程中由于无意或恶意篡改也带来了风险。因此,需要进行定期检查以检测和预防此类风险。脆弱水印是一种用于识别AI模型篡改的技术。然而,以往的方法面临诸多挑战,包括遗漏风险、额外信息传输以及无法精确定位篡改位置。本文提出了一种检测被篡改参数和比特的方法,可用于检测、定位并恢复被篡改的参数。我们还提出了一种自适应嵌入方法,在保持模型精度的同时最大化信息容量。该方法在多个经历权重参数修改攻击的神经网络上进行了测试,结果表明,当修改率低于20%时,我们的方法实现了优异的恢复性能。此外,对于水印显著影响精度的模型,我们利用自适应比特技术恢复了模型超过15%的精度损失。