Can We Infer Confidential Properties of Training Data from LLMs?

Large language models (LLMs) are increasingly fine-tuned on domain-specific datasets to support applications in fields such as healthcare, finance, and law. These fine-tuning datasets often have sensitive and confidential dataset-level properties -- such as patient demographics or disease prevalence -- that are not intended to be revealed. While prior work has studied property inference attacks on discriminative models (e.g., image classification models) and generative models (e.g., GANs for image data), it remains unclear if such attacks transfer to LLMs. In this work, we introduce PropInfer, a benchmark task for evaluating property inference in LLMs under two fine-tuning paradigms: question-answering and chat-completion. Built on the ChatDoctor dataset, our benchmark includes a range of property types and task configurations. We further propose two tailored attacks: a prompt-based generation attack and a shadow-model attack leveraging word frequency signals. Empirical evaluations across multiple pretrained LLMs show the success of our attacks, revealing a previously unrecognized vulnerability in LLMs.

翻译：大语言模型（LLMs）正日益在特定领域的数据集上进行微调，以支持医疗、金融和法律等领域的应用。这些微调数据集通常具有敏感且机密的**数据集层面属性**——例如患者人口统计学特征或疾病流行率——这些属性本不应被泄露。尽管先前的研究已针对判别式模型（如图像分类模型）和生成式模型（如用于图像数据的GANs）的属性推断攻击进行了探讨，但此类攻击是否适用于LLMs尚不明确。在本研究中，我们提出了**PropInfer**，这是一个用于评估LLMs在两种微调范式（问答与聊天补全）下属性推断能力的基准任务。基于ChatDoctor数据集构建，我们的基准涵盖多种属性类型和任务配置。我们进一步提出了两种定制化攻击方法：基于提示的生成攻击和利用词频信号的影子模型攻击。在多个预训练LLMs上的实证评估证明了我们攻击方法的有效性，揭示了LLMs中一个先前未被认识到的安全漏洞。

相关内容

属性

关注 1

一个具体事物，总是有许许多多的性质与关系，我们把一个事物的性质与关系，都叫作事物的属性。事物与属性是不可分的，事物都是有属性的事物，属性也都是事物的属性。一个事物与另一个事物的相同或相异，也就是一个事物的属性与另一事物的属性的相同或相异。由于事物属性的相同或相异，客观世界中就形成了许多不同的事物类。具有相同属性的事物就形成一类，具有不同属性的事物就分别地形成不同的类。

【CMU博士论文】大型语言模型的隐性特性

专知会员服务

15+阅读 · 2025年10月18日

面向性能、成本效益、云边隐私与可信性的大小语言模型协作综述

专知会员服务

15+阅读 · 2025年10月18日

《大语言模型的数据合成与增强综述》

专知会员服务

43+阅读 · 2024年10月19日

大语言模型中的提示隐私保护

专知会员服务

24+阅读 · 2024年7月24日