Deep neural networks are normally executed in the forward direction. However, in this work, we identify a vulnerability that enables models to be trained in both directions and on different tasks. Adversaries can exploit this capability to hide rogue models within seemingly legitimate models. In addition, in this work we show that neural networks can be taught to systematically memorize and retrieve specific samples from datasets. Together, these findings expose a novel method in which adversaries can exfiltrate datasets from protected learning environments under the guise of legitimate models. We focus on the data exfiltration attack and show that modern architectures can be used to secretly exfiltrate tens of thousands of samples with high fidelity, high enough to compromise data privacy and even train new models. Moreover, to mitigate this threat we propose a novel approach for detecting infected models.
翻译:深度神经网络通常以正向方式执行。然而,在本工作中,我们识别出一个漏洞,使得模型能够在两个方向上以及在不同任务上进行训练。攻击者可以利用这一能力,将恶意模型隐藏在看似合法的模型之中。此外,我们还证明,神经网络可以被训练来系统地记忆并检索数据集中的特定样本。综合这些发现,我们揭示了一种新的方法,使得攻击者能够在合法模型的伪装下,从受保护的学习环境中窃取数据集。我们聚焦于数据窃取攻击,并证明现代架构可用于秘密窃取数万个样本,且保真度极高,足以危及数据隐私甚至训练新模型。此外,为缓解这一威胁,我们提出了一种检测受感染模型的新方法。