Prominent Large Language Model (LLM) services from providers like OpenAI and Google excel at general tasks but often underperform on domain-specific applications. Current customization services for these LLMs typically require users to upload data for fine-tuning, posing significant privacy risks. While differentially private (DP) data synthesis presents a potential alternative, its application commonly results in low effectiveness due to the introduction of excessive noise on data for DP. To overcome this, we introduce Llamdex, a novel framework that facilitates LLM customization as a service, where the client uploads pre-trained domain-specific models rather than data. This client-uploaded model, optionally protected by DP with much lower noise, is inserted into the base LLM via connection modules. Significantly, these connecting modules are trained without requiring sensitive domain data, enabling clients to customize LLM services while preserving data privacy. Experiments demonstrate that Llamdex improves domain-specific accuracy by up to 26% over state-of-the-art private data synthesis methods under identical privacy constraints and, by obviating the need for users to provide domain context within queries, maintains inference efficiency comparable to the original LLM service.
翻译:OpenAI和Google等供应商提供的知名大语言模型(LLM)服务在通用任务上表现出色,但在特定领域应用中往往表现不佳。当前针对这些LLM的定制化服务通常要求用户上传数据进行微调,这带来了显著的隐私风险。虽然差分隐私(DP)数据合成提供了一种潜在的替代方案,但由于为满足DP要求而在数据中引入过多噪声,其应用通常导致效果低下。为解决这一问题,我们提出了Llamdex,一个新颖的框架,旨在实现作为服务的大语言模型定制化,其中客户端上传的是预训练的领域特定模型而非数据。这个客户端上传的模型(可选择以更低噪声的DP进行保护)通过连接模块插入到基础LLM中。重要的是,这些连接模块的训练无需敏感的领域数据,使得客户能够在定制LLM服务的同时保护数据隐私。实验表明,在相同的隐私约束下,Llamdex相比最先进的隐私数据合成方法将领域特定准确性提升了高达26%,并且由于无需用户在查询中提供领域上下文,保持了与原始LLM服务相当的推理效率。