Numerous mobile apps have leveraged deep learning capabilities. However, on-device models are vulnerable to attacks as they can be easily extracted from their corresponding mobile apps. Existing on-device attacking approaches only generate black-box attacks, which are far less effective and efficient than white-box strategies. This is because mobile deep learning frameworks like TFLite do not support gradient computing, which is necessary for white-box attacking algorithms. Thus, we argue that existing findings may underestimate the harmfulness of on-device attacks. To this end, we conduct a study to answer this research question: Can on-device models be directly attacked via white-box strategies? We first systematically analyze the difficulties of transforming the on-device model to its debuggable version, and propose a Reverse Engineering framework for On-device Models (REOM), which automatically reverses the compiled on-device TFLite model to the debuggable model. Specifically, REOM first transforms compiled on-device models into Open Neural Network Exchange format, then removes the non-debuggable parts, and converts them to the debuggable DL models format that allows attackers to exploit in a white-box setting. Our experimental results show that our approach is effective in achieving automated transformation among 244 TFLite models. Compared with previous attacks using surrogate models, REOM enables attackers to achieve higher attack success rates with a hundred times smaller attack perturbations. In addition, because the ONNX platform has plenty of tools for model format exchanging, the proposed method based on the ONNX platform can be adapted to other model formats. Our findings emphasize the need for developers to carefully consider their model deployment strategies, and use white-box methods to evaluate the vulnerability of on-device models.
翻译:众多移动应用已利用深度学习能力。然而,设备端模型因其易于从相应移动应用中提取而面临攻击风险。现有设备端攻击方法仅生成黑盒攻击,其效果与效率远低于白盒策略。这是因为TFLite等移动深度学习框架不支持白盒攻击算法所需的梯度计算。因此,我们认为现有研究可能低估了设备端攻击的危害性。为此,我们开展研究以解答这一核心问题:设备端模型能否直接通过白盒策略进行攻击?我们首先系统分析了将设备端模型转化为可调试版本面临的难点,提出面向设备模型的反向工程框架(REOM),该框架能自动将编译后的设备端TFLite模型逆向为可调试模型。具体而言,REOM首先将编译后的设备端模型转换为开放神经网络交换格式,随后移除不可调试部分,最终将其转化为允许攻击者在白盒场景下利用的可调试深度学习模型格式。实验结果表明,该方法在244个TFLite模型上实现了有效的自动化转换。相较此前使用代理模型的攻击方式,REOM能使攻击者以缩小百倍的攻击扰动实现更高攻击成功率。此外,由于ONNX平台具备丰富的模型格式转换工具,基于该平台的方法可适配其他模型格式。我们的研究结果强调,开发者需要审慎设计模型部署策略,并采用白盒方法评估设备端模型的脆弱性。