We study the problem of in-context learning (ICL) with large language models (LLMs) on private datasets. This scenario poses privacy risks, as LLMs may leak or regurgitate the private examples demonstrated in the prompt. We propose a novel algorithm that generates synthetic few-shot demonstrations from the private dataset with formal differential privacy (DP) guarantees, and show empirically that it can achieve effective ICL. We conduct extensive experiments on standard benchmarks and compare our algorithm with non-private ICL and zero-shot solutions. Our results demonstrate that our algorithm can achieve competitive performance with strong privacy levels. These results open up new possibilities for ICL with privacy protection for a broad range of applications.
翻译:我们研究在私有数据集上利用大型语言模型进行情境学习(ICL)的问题。该场景存在隐私风险,因为大型语言模型可能会泄露或重复提示中展示的私有示例。我们提出一种新颖算法,能够从私有数据集中生成具有形式化差分隐私(DP)保证的合成少样本示例,并通过实验证明该算法可实现有效的情境学习。我们在标准基准测试上开展大量实验,将所提算法与非私有情境学习及零样本解决方案进行对比。结果表明,我们的算法能在强隐私保护水平下取得具有竞争力的性能。这些结果为广泛的应用领域中实现具有隐私保护的情境学习开辟了新的可能性。