Deep learning solutions in critical domains like autonomous vehicles, facial recognition, and sentiment analysis require caution due to the severe consequences of errors. Research shows these models are vulnerable to adversarial attacks, such as data poisoning and neural trojaning, which can covertly manipulate model behavior, compromising reliability and safety. Current defense strategies like watermarking have limitations: they fail to detect all model modifications and primarily focus on attacks on CNNs in the image domain, neglecting other critical architectures like RNNs. To address these gaps, we introduce DeepiSign-G, a versatile watermarking approach designed for comprehensive verification of leading DNN architectures, including CNNs and RNNs. DeepiSign-G enhances model security by embedding an invisible watermark within the Walsh-Hadamard transform coefficients of the model's parameters. This watermark is highly sensitive and fragile, ensuring prompt detection of any modifications. Unlike traditional hashing techniques, DeepiSign-G allows substantial metadata incorporation directly within the model, enabling detailed, self-contained tracking and verification. We demonstrate DeepiSign-G's applicability across various architectures, including CNN models (VGG, ResNets, DenseNet) and RNNs (Text sentiment classifier). We experiment with four popular datasets: VGG Face, CIFAR10, GTSRB Traffic Sign, and Large Movie Review. We also evaluate DeepiSign-G under five potential attacks. Our comprehensive evaluation confirms that DeepiSign-G effectively detects these attacks without compromising CNN and RNN model performance, highlighting its efficacy as a robust security measure for deep learning applications. Detection of integrity breaches is nearly perfect, while hiding only a bit in approximately 1% of the Walsh-Hadamard coefficients.
翻译:在自动驾驶、人脸识别和情感分析等关键领域,深度学习解决方案因错误可能导致的严重后果而需谨慎对待。研究表明,这些模型易受对抗性攻击(如数据投毒和神经木马攻击)的影响,这些攻击可隐蔽地操纵模型行为,损害其可靠性与安全性。现有水印等防御策略存在局限:它们无法检测所有模型修改,且主要关注图像领域中CNN的攻击,忽视了RNN等其他关键架构。为弥补这些不足,我们提出了DeepiSign-G,一种通用水印方法,专为全面验证主流DNN架构(包括CNN和RNN)而设计。DeepiSign-G通过在模型参数的Walsh-Hadamard变换系数中嵌入不可见水印来增强模型安全性。该水印具有高度敏感性和脆弱性,可确保及时检测任何修改。与传统哈希技术不同,DeepiSign-G允许在模型内部直接嵌入大量元数据,从而实现详细的自包含追踪与验证。我们展示了DeepiSign-G在多种架构中的适用性,包括CNN模型(VGG、ResNets、DenseNet)和RNN(文本情感分类器)。我们在四个常用数据集上进行了实验:VGG Face、CIFAR10、GTSRB交通标志和Large Movie Review。我们还评估了DeepiSign-G在五种潜在攻击下的表现。综合评估证实,DeepiSign-G能有效检测这些攻击,且不影响CNN和RNN模型性能,突显了其作为深度学习应用强健安全措施的有效性。完整性破坏的检测近乎完美,而仅需在约1%的Walsh-Hadamard系数中隐藏一个比特。