Model extraction attack is one of the most prominent adversarial techniques to target machine learning models along with membership inference attack and model inversion attack. On the other hand, Explainable Artificial Intelligence (XAI) is a set of techniques and procedures to explain the decision making process behind AI. XAI is a great tool to understand the reasoning behind AI models but the data provided for such revelation creates security and privacy vulnerabilities. In this poster, we propose AUTOLYCUS, a model extraction attack that exploits the explanations provided by LIME to infer the decision boundaries of decision tree models and create extracted surrogate models that behave similar to a target model.
翻译:模型提取攻击与成员推理攻击和模型反转攻击齐名,是最突出的针对机器学习模型的对抗技术之一。可解释人工智能(XAI)是一套用于解释人工智能决策过程的技术与方法。XAI 是理解 AI 模型推理机制的重要工具,但为揭示模型机理所提供的数据却引发了安全与隐私漏洞。在本海报中,我们提出 AUTOLYCUS 这一模型提取攻击方法,利用 LIME 提供的解释来推断决策树模型的决策边界,并生成与目标模型行为相似的提取代理模型。