Machine learning (ML) models are costly to train as they can require a significant amount of data, computational resources and technical expertise. Thus, they constitute valuable intellectual property that needs protection from adversaries wanting to steal them. Ownership verification techniques allow the victims of model stealing attacks to demonstrate that a suspect model was in fact stolen from theirs. Although a number of ownership verification techniques based on watermarking or fingerprinting have been proposed, most of them fall short either in terms of security guarantees (well-equipped adversaries can evade verification) or computational cost. A fingerprinting technique, Dataset Inference (DI), has been shown to offer better robustness and efficiency than prior methods. The authors of DI provided a correctness proof for linear (suspect) models. However, in a subspace of the same setting, we prove that DI suffers from high false positives (FPs) -- it can incorrectly identify an independent model trained with non-overlapping data from the same distribution as stolen. We further prove that DI also triggers FPs in realistic, non-linear suspect models. We then confirm empirically that DI in the black-box setting leads to FPs, with high confidence. Second, we show that DI also suffers from false negatives (FNs) -- an adversary can fool DI (at the cost of incurring some accuracy loss) by regularising a stolen model's decision boundaries using adversarial training, thereby leading to an FN. To this end, we demonstrate that black-box DI fails to identify a model adversarially trained from a stolen dataset -- the setting where DI is the hardest to evade. Finally, we discuss the implications of our findings, the viability of fingerprinting-based ownership verification in general, and suggest directions for future work.
翻译:机器学习(ML)模型的训练成本高昂,因为它们可能需要大量的数据、计算资源和专业技术。因此,它们构成了需要保护的有价值知识产权,以防止试图窃取它们的对手。所有权验证技术使模型窃取攻击的受害者能够证明可疑模型实际上是从他们的模型窃取的。尽管已经提出了许多基于水印或指纹识别的所有权验证技术,但大多数技术在安全保证(装备精良的对手可以规避验证)或计算成本方面存在不足。一种指纹识别技术——数据集推理(DI)已被证明比先前方法具有更好的鲁棒性和效率。DI的作者为线性(可疑)模型提供了正确性证明。然而,在相同设置的子空间中,我们证明DI存在高误报率(FP)——它可能错误地将一个使用来自同一分布的非重叠数据训练的独立模型识别为被窃取的模型。我们进一步证明DI在现实中的非线性可疑模型中也会触发FP。然后我们通过实验证实,黑盒设置下的DI会导致高置信度的FP。其次,我们证明DI也存在漏报(FN)——对手可以通过使用对抗训练正则化被窃取模型的决策边界来欺骗DI(以牺牲一定的准确率为代价),从而导致FN。为此,我们证明黑盒DI无法识别从被窃取数据集中进行对抗训练的模型——这是DI最难规避的设置。最后,我们讨论了我们的发现带来的影响、基于指纹识别的所有权验证的可行性总体情况,并为未来工作提出了方向。