Large, general purpose language models have demonstrated impressive performance across many different conversational domains. While multi-domain language models achieve low overall perplexity, their outputs are not guaranteed to stay within the domain of a given input prompt. This paper proposes domain privacy as a novel way to quantify how likely a conditional language model will leak across domains. We also develop policy functions based on token-level domain classification, and propose an efficient fine-tuning method to improve the trained model's domain privacy. Experiments on membership inference attacks show that our proposed method has comparable resiliency to methods adapted from recent literature on differentially private language models.
翻译:大规模通用语言模型在许多不同对话领域展现出令人印象深刻的性能。虽然多领域语言模型实现了较低的总体困惑度,但其输出不能保证始终保持在给定输入提示的领域内。本文提出领域隐私作为一种新颖的量化方式,用于衡量条件语言模型跨领域泄露的可能性。我们还基于词元级领域分类开发了策略函数,并提出了一种高效微调方法以提升训练模型的领域隐私性。成员推断攻击实验表明,我们提出的方法与近期文献中基于差分隐私语言模型的方法相比,具有相当的鲁棒性。