In the burgeoning domain of machine learning, the reliance on third-party services for model training and the adoption of pre-trained models have surged. However, this reliance introduces vulnerabilities to model hijacking attacks, where adversaries manipulate models to perform unintended tasks, leading to significant security and ethical concerns, like turning an ordinary image classifier into a tool for detecting faces in pornographic content, all without the model owner's knowledge. This paper introduces Category-Agnostic Model Hijacking (CAMH), a novel model hijacking attack method capable of addressing the challenges of class number mismatch, data distribution divergence, and performance balance between the original and hijacking tasks. CAMH incorporates synchronized training layers, random noise optimization, and a dual-loop optimization approach to ensure minimal impact on the original task's performance while effectively executing the hijacking task. We evaluate CAMH across multiple benchmark datasets and network architectures, demonstrating its potent attack effectiveness while ensuring minimal degradation in the performance of the original task.
翻译:在蓬勃发展的机器学习领域,对第三方模型训练服务的依赖以及预训练模型的采用已大幅增长。然而,这种依赖引入了模型劫持攻击的脆弱性,攻击者可操纵模型执行非预期任务,从而引发重大的安全与伦理问题,例如在模型所有者不知情的情况下,将普通图像分类器转变为检测色情内容中人脸的工具。本文提出了类别无关模型劫持(CAMH),这是一种新颖的模型劫持攻击方法,能够应对类别数量不匹配、数据分布差异以及原始任务与劫持任务之间性能平衡的挑战。CAMH融合了同步训练层、随机噪声优化以及双循环优化方法,以确保在有效执行劫持任务的同时,对原始任务性能的影响降至最低。我们在多个基准数据集和网络架构上评估了CAMH,证明了其强大的攻击效力,同时确保了原始任务性能的最小化下降。