Service providers of large language model (LLM) applications collect user instructions in the wild and use them in further aligning LLMs with users' intentions. These instructions, which potentially contain sensitive information, are annotated by human workers in the process. This poses a new privacy risk not addressed by the typical private optimization. To this end, we propose using synthetic instructions to replace real instructions in data annotation and model fine-tuning. Formal differential privacy is guaranteed by generating those synthetic instructions using privately fine-tuned generators. Crucial in achieving the desired utility is our novel filtering algorithm that matches the distribution of the synthetic instructions to that of the real ones. In both supervised fine-tuning and reinforcement learning from human feedback, our extensive experiments demonstrate the high utility of the final set of synthetic instructions by showing comparable results to real instructions. In supervised fine-tuning, models trained with private synthetic instructions outperform leading open-source models such as Vicuna.
翻译:大型语言模型(LLM)应用的服务提供商从实际场景中收集用户指令,并利用这些指令进一步使LLM与用户意图对齐。这些可能包含敏感信息的指令在标注过程中需由人工工作者处理,从而引入了典型隐私优化方法未能解决的新型隐私风险。为此,我们提出使用合成指令替代真实指令进行数据标注和模型微调。通过采用经隐私保护微调的生成器来生成合成指令,可确保满足形式化的差分隐私要求。实现预期效用的关键在于我们新颖的过滤算法,该算法使合成指令的分布与真实指令分布相匹配。在监督微调和基于人类反馈的强化学习场景中,我们的大量实验表明:通过与真实指令可比的结果,最终合成的指令集具有高效用性。在监督微调任务中,使用隐私保护合成指令训练的模型性能优于Vicuna等主流开源模型。