Existing data-free model stealing methods use a generator to produce samples in order to train a student model to match the target model outputs. To this end, the two main challenges are estimating gradients of the target model without access to its parameters, and generating a diverse set of training samples that thoroughly explores the input space. We propose a Dual Student method where two students are symmetrically trained in order to provide the generator a criterion to generate samples that the two students disagree on. On one hand, disagreement on a sample implies at least one student has classified the sample incorrectly when compared to the target model. This incentive towards disagreement implicitly encourages the generator to explore more diverse regions of the input space. On the other hand, our method utilizes gradients of student models to indirectly estimate gradients of the target model. We show that this novel training objective for the generator network is equivalent to optimizing a lower bound on the generator's loss if we had access to the target model gradients. We show that our new optimization framework provides more accurate gradient estimation of the target model and better accuracies on benchmark classification datasets. Additionally, our approach balances improved query efficiency with training computation cost. Finally, we demonstrate that our method serves as a better proxy model for transfer-based adversarial attacks than existing data-free model stealing methods.
翻译:现有无数据模型窃取方法通过生成器产生样本,训练学生模型以匹配目标模型输出。为此,主要面临两大挑战:在无法访问目标模型参数的情况下估计其梯度,以及生成能彻底探索输入空间的多样化训练样本。我们提出了一种双学生方法,对称训练两个学生模型,为生成器提供判定标准以生成两个学生模型存在分歧的样本。一方面,样本上的分歧意味着至少有一个学生模型相较于目标模型对该样本分类错误。这种对分歧的激励隐式地促使生成器探索输入空间中更多样的区域。另一方面,我们的方法利用学生模型的梯度间接估计目标模型的梯度。我们证明,这种新颖的生成器训练目标等价于在可访问目标模型梯度时优化生成器损失的下界。实验表明,我们的新优化框架能更准确地估计目标模型梯度,并在基准分类数据集上取得更高准确率。此外,我们的方法在提升查询效率与训练计算成本之间实现了平衡。最后,我们证明该方法作为迁移性对抗攻击的替代模型,优于现有无数据模型窃取方法。