Privacy issues arise prominently during the inappropriate transmission of information between entities. Existing research primarily studies privacy by exploring various privacy attacks, defenses, and evaluations within narrowly predefined patterns, while neglecting that privacy is not an isolated, context-free concept limited to traditionally sensitive data (e.g., social security numbers), but intertwined with intricate social contexts that complicate the identification and analysis of potential privacy violations. The advent of Large Language Models (LLMs) offers unprecedented opportunities for incorporating the nuanced scenarios outlined in privacy laws to tackle these complex privacy issues. However, the scarcity of open-source relevant case studies restricts the efficiency of LLMs in aligning with specific legal statutes. To address this challenge, we introduce a novel framework, GoldCoin, designed to efficiently ground LLMs in privacy laws for judicial assessing privacy violations. Our framework leverages the theory of contextual integrity as a bridge, creating numerous synthetic scenarios grounded in relevant privacy statutes (e.g., HIPAA), to assist LLMs in comprehending the complex contexts for identifying privacy risks in the real world. Extensive experimental results demonstrate that GoldCoin markedly enhances LLMs' capabilities in recognizing privacy risks across real court cases, surpassing the baselines on different judicial tasks.
翻译:隐私问题主要产生于实体间不恰当的信息传递。现有研究主要通过探索狭义的预定义模式下的各种隐私攻击、防御与评估来研究隐私,却忽视了隐私并非孤立、脱离情境且仅限于传统敏感数据(如社会安全号码)的概念,而是与错综复杂的社会情境相互交织,这使得潜在隐私侵犯的识别与分析变得困难。大型语言模型(LLMs)的出现为整合隐私法律中阐述的细微场景以应对这些复杂隐私问题提供了前所未有的机遇。然而,开源相关案例研究的稀缺限制了LLMs与特定法律条文对齐的效率。为应对这一挑战,我们引入了一个新颖的框架GoldCoin,旨在高效地将LLMs根植于隐私法律,以用于司法评估隐私侵犯。我们的框架利用情境完整性理论作为桥梁,基于相关隐私法规(如HIPAA)创建大量合成场景,以协助LLMs理解复杂情境,从而识别现实世界中的隐私风险。大量实验结果表明,GoldCoin显著提升了LLMs在真实法庭案例中识别隐私风险的能力,在不同司法任务上超越了基线模型。