As large language models (LLMs) become more prevalent, effectively utilizing domain-specific knowledge while ensuring privacy has become critical. Existing methods often struggle to balance utility and privacy. For instance, retrieval-augmented generation (RAG) enables LLMs to access domain-specific knowledge but compromises the privacy of sensitive data. On the other hand, differentially private data synthesis techniques offer strong privacy guarantees but often result in poor utility. To address this challenge, we propose Llamdex, a novel framework that enhances LLMs using only models trained on domain-specific data, integrated into LLMs through carefully designed connection modules. Our approach significantly enhances the accuracy of domain-specific tasks, achieving up to a 26% accuracy improvement compared to state-of-the-art data synthesis methods under the same differential privacy constraints. Experimental results show that Llamdex not only improves the accuracy of LLM responses but also maintains comparable inference efficiency to the original LLM, highlighting its potential for real applications.
翻译:随着大型语言模型(LLMs)的日益普及,在确保隐私的前提下有效利用领域特定知识变得至关重要。现有方法往往难以平衡效用与隐私。例如,检索增强生成(RAG)技术虽然使LLMs能够访问领域特定知识,却会损害敏感数据的隐私性。另一方面,差分隐私数据合成技术虽能提供强大的隐私保障,但通常导致效用低下。为应对这一挑战,我们提出了Llamdex——一种创新框架,该框架仅利用在领域特定数据上训练的模型来增强LLMs,并通过精心设计的连接模块将其集成到LLMs中。我们的方法显著提升了领域特定任务的准确性,在相同差分隐私约束条件下,相比最先进的数据合成方法实现了高达26%的准确率提升。实验结果表明,Llamdex不仅提高了LLM响应的准确性,同时保持了与原始LLM相当的推理效率,彰显了其在实际应用中的潜力。