With LLMs increasingly deployed in corporate data management, it is crucial to ensure that these models do not leak sensitive information. In the context of corporate data management, the concept of sensitivity awareness has been introduced, enabling LLMs to adhere to predefined access rights rules. However, it remains unclear how sensitivity awareness relates to established notions of privacy, such as differential privacy (DP), thereby making it difficult to deploy meaningfully in real-world applications. In this work, we formalize the notion of sensitivity awareness and theoretically establish its connection to DP. Additionally, we develop a supervised fine-tuning recipe to make existing, four-bit quantized LLMs more sensitivity-aware. With a performance boost of up to 21.7%, the finetuned LLMs not only substantially improve over their baseline but also outperform other full-precision open-source and commercial models of similar size in achieving sensitivity awareness, demonstrating the effectiveness of our proposed approach. At the same time, our method also largely preserves the models' performance on other tasks, such as general instruction-following, mathematical, and common-sense reasoning.
翻译:随着大型语言模型(LLM)在企业数据管理中的部署日益增多,确保这些模型不泄露敏感信息变得至关重要。在企业数据管理背景下,敏感感知这一概念被提出,使LLM能够遵循预定义的访问权限规则。然而,目前尚不清楚敏感感知与既有的隐私概念(如差分隐私(DP))之间存在何种关联,这导致其难以在实际应用中有效部署。在本工作中,我们形式化了敏感感知的概念,并从理论上建立了其与DP的联系。此外,我们开发了一种监督微调方案,使现有的四位量化LLM更具敏感感知能力。通过高达21.7%的性能提升,微调后的LLM不仅显著超越了其基线模型,而且在实现敏感感知方面超越了其他类似规模的全精度开源和商业模型,证明了我们所提方法的有效性。同时,我们的方法也在很大程度上保留了模型在其他任务上的性能,例如通用指令遵循、数学推理和常识推理。