Model extraction emerges as a critical security threat with attack vectors exploiting both algorithmic and implementation-based approaches. The main goal of an attacker is to steal as much information as possible about a protected victim model, so that he can mimic it with a substitute model, even with a limited access to similar training data. Recently, physical attacks such as fault injection have shown worrying efficiency against the integrity and confidentiality of embedded models. We focus on embedded deep neural network models on 32-bit microcontrollers, a widespread family of hardware platforms in IoT, and the use of a standard fault injection strategy - Safe Error Attack (SEA) - to perform a model extraction attack with an adversary having a limited access to training data. Since the attack strongly depends on the input queries, we propose a black-box approach to craft a successful attack set. For a classical convolutional neural network, we successfully recover at least 90% of the most significant bits with about 1500 crafted inputs. These information enable to efficiently train a substitute model, with only 8% of the training dataset, that reaches high fidelity and near identical accuracy level than the victim model.
翻译:模型提取作为一种关键的安全威胁,其攻击向量既利用算法方法也利用基于实现的方法。攻击者的主要目标是窃取尽可能多的关于受保护受害者模型的信息,以便能够在仅获得有限类似训练数据访问权限的情况下,通过替代模型对其进行模仿。最近,诸如故障注入之类的物理攻击在破坏嵌入式模型的完整性和机密性方面显示出令人担忧的效率。我们专注于物联网中广泛使用的32位微控制器上的嵌入式深度神经网络模型,并利用标准故障注入策略——安全错误攻击(SEA)——在攻击者仅能有限访问训练数据的条件下执行模型提取攻击。由于该攻击强烈依赖于输入查询,我们提出了一种黑盒方法来构建成功的攻击集。对于一个经典的卷积神经网络,我们成功恢复了至少90%的最高有效位,仅需约1500个精心构建的输入。这些信息使得能够高效训练一个替代模型,仅使用8%的训练数据集,即可达到与受害者模型高度相似且精度近乎相同的水平。