We introduce the problem of model-extraction attacks in cyber-physical systems in which an attacker attempts to estimate (or extract) the feedback controller of the system. Extracting (or estimating) the controller provides an unmatched edge to attackers since it allows them to predict the future control actions of the system and plan their attack accordingly. Hence, it is important to understand the ability of the attackers to perform such an attack. In this paper, we focus on the setting when a Deep Neural Network (DNN) controller is trained using Reinforcement Learning (RL) algorithms and is used to control a stochastic system. We play the role of the attacker that aims to estimate such an unknown DNN controller, and we propose a two-phase algorithm. In the first phase, also called the offline phase, the attacker uses side-channel information about the RL-reward function and the system dynamics to identify a set of candidate estimates of the unknown DNN. In the second phase, also called the online phase, the attacker observes the behavior of the unknown DNN and uses these observations to shortlist the set of final policy estimates. We provide theoretical analysis of the error between the unknown DNN and the estimated one. We also provide numerical results showing the effectiveness of the proposed algorithm.
翻译:我们提出了信息物理系统中模型提取攻击的问题,在该攻击中,攻击者试图估计(或提取)系统的反馈控制器。提取(或估计)控制器为攻击者提供了无与伦比的优势,因为它允许攻击者预测系统未来的控制动作,并据此策划攻击。因此,理解攻击者实施此类攻击的能力至关重要。本文聚焦于以下场景:使用强化学习算法训练的深度神经网络(DNN)控制器,被用于控制一个随机系统。我们扮演试图估计此类未知DNN控制器的攻击者角色,并提出了一种两阶段算法。在第一阶段,也称为离线阶段,攻击者利用关于强化学习奖励函数和系统动力学的侧信道信息,来识别未知DNN的一组候选估计。在第二阶段,也称为在线阶段,攻击者观察未知DNN的行为,并利用这些观察结果来缩减最终策略估计的候选集合。我们对未知DNN与估计DNN之间的误差进行了理论分析,并提供了数值结果,展示了所提出算法的有效性。