In learning-based functionality stealing, the attacker is trying to build a local model based on the victim's outputs. The attacker has to make choices regarding the local model's architecture, optimization method and, specifically for NLP models, subword vocabulary, such as BPE. On the machine translation task, we explore (1) whether the choice of the vocabulary plays a role in model stealing scenarios and (2) if it is possible to extract the victim's vocabulary. We find that the vocabulary itself does not have a large effect on the local model's performance. Given gray-box model access, it is possible to collect the victim's vocabulary by collecting the outputs (detokenized subwords on the output). The results of the minimum effect of vocabulary choice are important more broadly for black-box knowledge distillation.
翻译:在基于学习的功能窃取中,攻击者试图利用受害者的输出来构建本地模型。攻击者需要为本地模型选择架构、优化方法,尤其是针对自然语言处理模型,还需要选择子词词汇表(例如 BPE)。针对机器翻译任务,我们探讨了(1)词汇表选择是否在模型窃取场景中发挥作用,以及(2)是否可能提取受害者的词汇表。我们发现词汇表本身对本地模型性能影响不大。在灰盒模型访问条件下,可以通过收集输出来获取受害者的词汇表(输出端的去标记化子词)。词汇表选择影响最小的这一结论,对于黑盒知识蒸馏具有更广泛的重要意义。