Artificial intelligence, and specifically deep neural networks (DNNs), has rapidly emerged in the past decade as the standard for several tasks from specific advertising to object detection. The performance offered has led DNN algorithms to become a part of critical embedded systems, requiring both efficiency and reliability. In particular, DNNs are subject to malicious examples designed in a way to fool the network while being undetectable to the human observer: the adversarial examples. While previous studies propose frameworks to implement such attacks in black box settings, those often rely on the hypothesis that the attacker has access to the logits of the neural network, breaking the assumption of the traditional black box. In this paper, we investigate a real black box scenario where the attacker has no access to the logits. In particular, we propose an architecture-agnostic attack which solve this constraint by extracting the logits. Our method combines hardware and software attacks, by performing a side-channel attack that exploits electromagnetic leakages to extract the logits for a given input, allowing an attacker to estimate the gradients and produce state-of-the-art adversarial examples to fool the targeted neural network. Through this example of adversarial attack, we demonstrate the effectiveness of logits extraction using side-channel as a first step for more general attack frameworks requiring either the logits or the confidence scores.
翻译:人工智能,特别是深度神经网络(DNNs),在过去十年中迅速成为从特定广告到目标检测等多项任务的标准技术。其优异性能使得DNN算法成为需要兼顾效率与可靠性的关键嵌入式系统的一部分。特别地,DNN容易受到恶意样本的攻击,这些样本被设计成既能欺骗网络,又对人类观察者不可见:即对抗样本。尽管先前的研究提出了在黑箱设置中实施此类攻击的框架,但这些框架通常依赖于攻击者能够访问神经网络logits的假设,从而打破了传统黑箱的设定。本文研究了一种攻击者无法访问logits的真实黑箱场景。具体地,我们提出了一种与架构无关的攻击方法,通过提取logits来解决这一限制。我们的方法结合了硬件和软件攻击,通过执行旁路攻击来利用电磁泄漏提取给定输入的logits,从而使攻击者能够估计梯度并生成最先进的对抗样本来欺骗目标神经网络。通过这一对抗攻击实例,我们证明了利用旁路提取logits作为第一步,对于需要logits或置信度分数的更通用攻击框架的有效性。