Service providers of large language model (LLM) applications collect user instructions in the wild and use them in further aligning LLMs with users' intentions. These instructions, which potentially contain sensitive information, are annotated by human workers in the process. This poses a new privacy risk not addressed by the typical private optimization. To this end, we propose using synthetic instructions to replace real instructions in data annotation and model fine-tuning. Formal differential privacy is guaranteed by generating those synthetic instructions using privately fine-tuned generators. Crucial in achieving the desired utility is our novel filtering algorithm that matches the distribution of the synthetic instructions to that of the real ones. In both supervised fine-tuning and reinforcement learning from human feedback, our extensive experiments demonstrate the high utility of the final set of synthetic instructions by showing comparable results to real instructions. In supervised fine-tuning, models trained with private synthetic instructions outperform leading open-source models such as Vicuna.
翻译:大型语言模型应用的服务提供商在野外收集用户指令,并将其用于进一步对齐模型与用户意图。这些可能包含敏感信息的指令在流程中由人工标注员处理。这带来了典型的隐私优化方法无法解决的新隐私风险。为此,我们提出使用合成指令替代真实指令进行数据标注和模型微调。通过使用私密微调生成器生成这些合成指令,保证了形式化的差分隐私。实现预期效用的关键是我们新颖的过滤算法,该算法使合成指令的分布与真实指令的分布相匹配。在监督式微调和基于人类反馈的强化学习中,我们的大量实验表明,最终合成的指令集具有高实用性,其结果与真实指令相当。在监督式微调中,使用私密合成指令训练的模型性能优于Vicuna等领先的开源模型。