Language models (LMs) have shown outstanding performance in text summarization including sensitive domains such as medicine and law. In these settings, it is important that personally identifying information (PII) included in the source document should not leak in the summary. Prior efforts have mostly focused on studying how LMs may inadvertently elicit PII from training data. However, to what extent LMs can provide privacy-preserving summaries given a non-private source document remains under-explored. In this paper, we perform a comprehensive study across two closed- and three open-weight LMs of different sizes and families. We experiment with prompting and fine-tuning strategies for privacy-preservation across a range of summarization datasets across three domains. Our extensive quantitative and qualitative analysis including human evaluation shows that LMs often cannot prevent PII leakage on their summaries and that current widely-used metrics cannot capture context dependent privacy risks.
翻译:语言模型(LMs)在文本摘要任务中表现出卓越性能,包括医学和法律等敏感领域。在这些场景中,确保源文档中包含的个人身份信息(PII)不会在摘要中泄露至关重要。先前的研究主要集中于探讨语言模型如何可能无意间从训练数据中泄露PII。然而,对于给定非隐私源文档时,语言模型能在多大程度上提供保护隐私的摘要,这一问题仍未得到充分探索。本文对两种闭源模型和三种开源模型进行了全面研究,涵盖了不同规模和架构的模型系列。我们在三个领域的多种摘要数据集上,尝试了通过提示工程和微调策略来实现隐私保护。我们广泛的定量与定性分析(包括人工评估)表明,语言模型通常无法防止其生成的摘要中出现PII泄露,且当前广泛使用的评估指标无法捕捉上下文相关的隐私风险。