SecureGate: Learning When to Reveal PII Safely via Token-Gated Dual-Adapters for Federated LLMs

Federated learning (FL) enables collaborative training across organizational silos without sharing raw data, making it attractive for privacy-sensitive applications. With the rapid adoption of large language models (LLMs), federated fine-tuning of generative LLMs has gained attention as a way to leverage distributed data while preserving confidentiality. However, this setting introduces fundamental challenges: (i) privacy leakage of personally identifiable information (PII) due to LLM memorization, and (ii) a persistent tension between global generalization and local utility under heterogeneous data. Existing defenses, such as data sanitization and differential privacy, reduce leakage but often degrade downstream performance. We propose SecureGate, a privacy-aware federated fine-tuning framework for LLMs that provides fine-grained privacy control without sacrificing utility. SecureGate employs a dual-adapter LoRA architecture: a secure adapter that learns sanitized, globally shareable representations, and a revealing adapter that captures sensitive, organization-specific knowledge. A token-controlled gating module selectively activates these adapters at inference time, enabling controlled information disclosure without retraining. Extensive experiments across multiple LLMs and real-world datasets show that SecureGate improves task utility while substantially reducing PII leakage, achieving up to a 31.66X reduction in inference attack accuracy and a 17.07X reduction in extraction recall for unauthorized requests. Additionally, it maintains 100% routing reliability to the correct adapter and incurs only minimal computational and communication overhead.

翻译：联邦学习（FL）使得跨组织壁垒的协同训练无需共享原始数据，这对隐私敏感型应用具有吸引力。随着大语言模型（LLMs）的快速普及，生成式LLMs的联邦微调作为一种在保护机密性的同时利用分布式数据的方式，已受到关注。然而，这种设置引入了根本性挑战：（i）由于LLM的记忆特性导致个人可识别信息（PII）的隐私泄露，以及（ii）在异构数据下全局泛化与局部效用之间持续的紧张关系。现有的防御措施，如数据清洗和差分隐私，虽能减少泄露，但通常会降低下游性能。我们提出SecureGate，一个面向LLMs的隐私感知联邦微调框架，它能在不牺牲效用的前提下提供细粒度的隐私控制。SecureGate采用双适配器LoRA架构：一个安全适配器学习经过清洗、可全局共享的表征，另一个揭示适配器捕获敏感的、组织特定的知识。一个令牌控制的门控模块在推理时选择性地激活这些适配器，从而实现无需重新训练的可控信息泄露。在多种LLMs和真实数据集上进行的大量实验表明，SecureGate在显著减少PII泄露的同时提升了任务效用，对于未经授权的请求，实现了推理攻击准确率最高降低31.66倍，提取召回率最高降低17.07倍。此外，它保持了100%路由到正确适配器的可靠性，并且仅产生极小的计算和通信开销。