The use of machine learning (ML) has become increasingly prevalent in various domains, highlighting the importance of understanding and ensuring its safety. One pressing concern is the vulnerability of ML applications to model stealing attacks. These attacks involve adversaries attempting to recover a learned model through limited query-response interactions, such as those found in cloud-based services or on-chip artificial intelligence interfaces. While existing literature proposes various attack and defense strategies, these often lack a theoretical foundation and standardized evaluation criteria. In response, this work presents a framework called ``Model Privacy'', providing a foundation for comprehensively analyzing model stealing attacks and defenses. We establish a rigorous formulation for the threat model and objectives, propose methods to quantify the goodness of attack and defense strategies, and analyze the fundamental tradeoffs between utility and privacy in ML models. Our developed theory offers valuable insights into enhancing the security of ML models, especially highlighting the importance of the attack-specific structure of perturbations for effective defenses. We demonstrate the application of model privacy from the defender's perspective through various learning scenarios. Extensive experiments corroborate the insights and the effectiveness of defense mechanisms developed under the proposed framework.
翻译:机器学习(ML)在各领域的应用日益广泛,凸显了理解并确保其安全性的重要性。其中一项紧迫问题在于ML应用易遭受模型窃取攻击。此类攻击中, adversary试图通过有限的查询-响应交互(例如基于云的服务或片上人工智能接口中的交互)来恢复已学习的模型。尽管现有文献提出了多种攻击与防御策略,但这些策略往往缺乏理论基础和标准化评估准则。为此,本研究提出一个名为"模型隐私"的框架,为全面分析模型窃取攻击与防御提供了基础。我们建立了威胁模型与目标的严格形式化表述,提出了量化攻击与防御策略优劣的方法,并分析了ML模型中效用与隐私之间的基本权衡。所发展的理论为增强ML模型的安全性提供了宝贵见解,尤其凸显了防御中针对攻击特性的扰动结构的重要性。我们通过多种学习场景从防御者视角展示了模型隐私的应用。大量实验验证了所提见解以及该框架下开发防御机制的有效性。