Substantial research works have shown that deep models, e.g., pre-trained models, on the large corpus can learn universal language representations, which are beneficial for downstream NLP tasks. However, these powerful models are also vulnerable to various privacy attacks, while much sensitive information exists in the training dataset. The attacker can easily steal sensitive information from public models, e.g., individuals' email addresses and phone numbers. In an attempt to address these issues, particularly the unauthorized use of private data, we introduce a novel watermarking technique via a backdoor-based membership inference approach named TextMarker, which can safeguard diverse forms of private information embedded in the training text data. Specifically, TextMarker only requires data owners to mark a small number of samples for data copyright protection under the black-box access assumption to the target model. Through extensive evaluation, we demonstrate the effectiveness of TextMarker on various real-world datasets, e.g., marking only 0.1% of the training dataset is practically sufficient for effective membership inference with negligible effect on model utility. We also discuss potential countermeasures and show that TextMarker is stealthy enough to bypass them.
翻译:大量研究工作表明,在大规模语料上预训练的深度学习模型(例如预训练模型)能够学习通用语言表征,这对下游自然语言处理任务十分有益。然而,这些强大模型也易受各类隐私攻击,且训练数据集中包含大量敏感信息。攻击者可轻松从公共模型中窃取敏感信息,例如个人的电子邮件地址和电话号码。为解决上述问题,特别是针对私有数据未经授权使用的现象,我们提出了一种新颖的后门式成员推断水印技术TextMarker,该技术可保护嵌入在训练文本数据中的多种形式的隐私信息。具体而言,TextMarker仅要求数据所有者在黑盒访问目标模型的假设下,标记少量样本以实现数据版权保护。通过广泛评估,我们证明了TextMarker在多种真实数据集上的有效性——例如,仅标记训练数据集的0.1%便足以实现有效的成员推断,且对模型效用影响极小。我们还探讨了潜在的对抗措施,并表明TextMarker具备足够的隐蔽性来规避这些措施。