Large, general purpose language models have demonstrated impressive performance across many different conversational domains. While multi-domain language models achieve low overall perplexity, their outputs are not guaranteed to stay within the domain of a given input prompt. This paper proposes domain privacy as a novel way to quantify how likely a conditional language model will leak across domains. We also develop policy functions based on token-level domain classification, and propose an efficient fine-tuning method to improve the trained model's domain privacy. Experiments on membership inference attacks show that our proposed method has comparable resiliency to methods adapted from recent literature on differentially private language models.
翻译:大型通用语言模型已在多种对话领域展现出卓越性能。尽管多领域语言模型整体困惑度较低,但其输出无法保证严格限定在给定输入提示的领域范围内。本文提出"领域隐私"这一新型量化指标,用于衡量条件语言模型跨领域泄露信息的可能性。我们基于词元级领域分类开发了策略函数,并提出一种高效微调方法来提升训练模型的领域隐私性。成员推断攻击实验表明,本文方法的鲁棒性与近期差分隐私语言模型文献中的改进方法相当。