Extracting Cloud-based Model with Prior Knowledge

Machine Learning-as-a-Service, a pay-as-you-go business pattern, is widely accepted by third-party users and developers. However, the open inference APIs may be utilized by malicious customers to conduct model extraction attacks, i.e., attackers can replicate a cloud-based black-box model merely via querying malicious examples. Existing model extraction attacks mainly depend on the posterior knowledge (i.e., predictions of query samples) from Oracle. Thus, they either require high query overhead to simulate the decision boundary, or suffer from generalization errors and overfitting problems due to query budget limitations. To mitigate it, this work proposes an efficient model extraction attack based on prior knowledge for the first time. The insight is that prior knowledge of unlabeled proxy datasets is conducive to the search for the decision boundary (e.g., informative samples). Specifically, we leverage self-supervised learning including autoencoder and contrastive learning to pre-compile the prior knowledge of the proxy dataset into the feature extractor of the substitute model. Then we adopt entropy to measure and sample the most informative examples to query the target model. Our design leverages both prior and posterior knowledge to extract the model and thus eliminates generalizability errors and overfitting problems. We conduct extensive experiments on open APIs like Traffic Recognition, Flower Recognition, Moderation Recognition, and NSFW Recognition from real-world platforms, Azure and Clarifai. The experimental results demonstrate the effectiveness and efficiency of our attack. For example, our attack achieves 95.1% fidelity with merely 1.8K queries (cost 2.16$) on the NSFW Recognition API. Also, the adversarial examples generated with our substitute model have better transferability than others, which reveals that our scheme is more conducive to downstream attacks.

翻译：机器学习即服务（Machine Learning-as-a-Service）作为一种按需付费的商业模式，已被第三方用户和开发者广泛接受。然而，开放的推理API可能被恶意用户利用进行模型提取攻击，即攻击者仅通过查询恶意样本就能复制基于云的黑盒模型。现有模型提取攻击主要依赖来自Oracle的后验知识（即查询样本的预测结果）。因此，这类方法要么需要高昂的查询开销来模拟决策边界，要么因查询预算限制而面临泛化误差和过拟合问题。为解决这一问题，本文首次提出一种基于先验知识的高效模型提取攻击。其核心思想在于：无标签代理数据集的先验知识有助于搜索决策边界（例如信息量丰富的样本）。具体而言，我们利用自监督学习（包括自编码器和对比学习）将代理数据集的先验知识预编译到替代模型的特征提取器中。随后采用熵值来度量并采样最具信息量的样本，以查询目标模型。我们的设计结合了先验知识与后验知识进行模型提取，从而消除了泛化误差和过拟合问题。我们在来自真实平台（Azure和Clarifai）的开放API（如流量识别、花卉识别、内容审核识别和NSFW识别）上开展了大量实验。实验结果表明了我们攻击的有效性和高效性。例如，在对NSFW识别API的攻击中，仅需1.8K次查询（成本2.16美元）即可达到95.1%的保真度。此外，基于我们的替代模型生成的对抗样本具有优于其他方法的可迁移性，这揭示了我们的方案更有利于下游攻击。