Model extraction emerges as a critical security threat with attack vectors exploiting both algorithmic and implementation-based approaches. The main goal of an attacker is to steal as much information as possible about a protected victim model, so that he can mimic it with a substitute model, even with a limited access to similar training data. Recently, physical attacks such as fault injection have shown worrying efficiency against the integrity and confidentiality of embedded models. We focus on embedded deep neural network models on 32-bit microcontrollers, a widespread family of hardware platforms in IoT, and the use of a standard fault injection strategy - Safe Error Attack (SEA) - to perform a model extraction attack with an adversary having a limited access to training data. Since the attack strongly depends on the input queries, we propose a black-box approach to craft a successful attack set. For a classical convolutional neural network, we successfully recover at least 90% of the most significant bits with about 1500 crafted inputs. These information enable to efficiently train a substitute model, with only 8% of the training dataset, that reaches high fidelity and near identical accuracy level than the victim model.
翻译:模型提取正成为一种关键的安全威胁,其攻击向量同时利用算法和实现层面的手段。攻击者的主要目标是尽可能多地窃取受保护受害者模型的信息,以便即使仅能有限访问相似训练数据,也能用替代模型模仿该模型。近年来,物理攻击(如故障注入)已被证明对嵌入式模型的完整性和机密性具有令人担忧的效率。我们重点关注32位微控制器(物联网中广泛采用的硬件平台系列)上的嵌入式深度神经网络模型,并利用标准故障注入策略——安全错误攻击(Safe Error Attack, SEA)——来执行模型提取攻击,其中攻击者仅能有限访问训练数据。由于该攻击强烈依赖于输入查询,我们提出一种黑盒方法来构建成功的攻击集。对于一个经典卷积神经网络,我们成功恢复了至少90%的最高有效位,仅需约1500个精心设计的输入。这些信息使我们能够高效训练一个替代模型,仅使用训练数据集的8%,即可达到高保真度,并与受害者模型的准确率水平几乎一致。