LLM agents increasingly act on users' personal information, yet existing privacy defenses remain limited in both design and adaptability. Most prior approaches rely on static or passive defenses, such as prompting and guarding. These paradigms are insufficient for supporting contextual, proactive privacy decisions in multi-step agent execution. We propose Contextualized Defense Instructing (CDI), a new privacy defense paradigm in which an instructor model generates step-specific, context-aware privacy guidance during execution, proactively shaping actions rather than merely constraining or vetoing them. Crucially, CDI is paired with an experience-driven optimization framework that trains the instructor via reinforcement learning (RL), where we convert failure trajectories with privacy violations into learning environments. We formalize baseline defenses and CDI as distinct intervention points in a canonical agent loop, and compare their privacy-helpfulness trade-offs within a unified simulation framework. Results show that our CDI consistently achieves a better balance between privacy preservation (94.2%) and helpfulness (80.6%) than baselines, with superior robustness to adversarial conditions and generalization.
翻译:随着LLM Agent日益频繁地处理用户个人信息,现有隐私防御机制在设计与适应性方面仍存在明显局限。当前主流方法主要依赖静态或被动的防御范式,例如提示工程与防护机制。这些范式难以支持多步骤Agent执行过程中所需的情境化、主动式隐私决策。本文提出情境化防御指导(Contextualized Defense Instructing, CDI)这一新型隐私防御范式,通过指导模型在执行过程中生成针对具体步骤、具备情境感知能力的隐私指引,从而主动塑造Agent行为而非仅进行约束或否决。关键创新在于,CDI配套构建了经验驱动的优化框架,通过强化学习(RL)训练指导模型——我们将包含隐私违规的失败轨迹转化为学习环境。通过将基线防御方法与CDI形式化为标准Agent执行循环中的不同干预节点,我们在统一仿真框架内系统比较了它们在隐私保护与任务效用的权衡表现。实验结果表明,相较于基线方法,CDI在隐私保护率(94.2%)与任务完成度(80.6%)之间实现了更优的平衡,同时展现出更强的对抗条件鲁棒性与泛化能力。