Driven by Large Language Models, the single-agent, multi-tool architecture has become a popular paradigm for autonomous agents. However, this architecture introduces a severe privacy risk, which we term Tools Orchestration Privacy Risk (TOP-R): an agent, to achieve a benign user goal, autonomously aggregates non-sensitive fragments from multiple tools and synthesizes unexpected sensitive information. We provide the first systematic study of this risk. We establish a formal framework characterizing TOP-R through three necessary conditions -- conclusion sensitivity, single-source non-inferability, and compositional inferability. We construct TOP-Bench via a Reverse Inference Seed Expansion (RISE) pipeline, incorporating paired social-context scenarios for diagnostic analysis. We further introduce the H-Score, a harmonic mean of task completion and safety, to quantify the utility-safety trade-off. Evaluation of six state-of-the-art LLMs reveals pervasive risk: the average Overall Leakage Rate reaches 62.11% with an H-Score of only 52.90%. Our experiments identify three root causes: deficient spontaneous privacy awareness, reasoning overshoot, and inference inertia. Guided by these findings, we propose three complementary mitigation strategies targeting the output, reasoning, and review stages of the agent pipeline; the strongest configuration, Dual-Constraint Privacy Enhancement, achieves an H-Score of 79.20%. Our work reveals a new risk class in tool-using agents, analyzes leakage causes, and provides practical mitigation strategies.
翻译:在大型语言模型的驱动下,单智能体多工具架构已成为自主智能体的主流范式。然而,该架构引入了严重的隐私风险,我们称之为工具编排隐私风险:智能体为实现良性用户目标,自主聚合来自多个工具的非敏感信息片段,进而合成意料之外的敏感信息。本文首次对该风险进行了系统性研究。我们建立了形式化框架,通过三个必要条件——结论敏感性、单源不可推理性与组合可推理性——来刻画工具编排隐私风险。通过反向推理种子扩展流程,我们构建了包含配对社会情境场景的诊断分析基准测试集TOP-Bench。进一步引入H-Score(任务完成度与安全性的调和平均数)来量化效用-安全权衡。对六个前沿大型语言模型的评估揭示了普遍存在的风险:平均总体泄露率达到62.11%,而H-Score仅为52.90%。实验分析识别出三大根本原因:自发性隐私意识缺失、推理越界与推断惯性。基于这些发现,我们提出了三种针对智能体流程输出、推理与审查阶段的互补缓解策略;其中最强配置——双重约束隐私增强方案——实现了79.20%的H-Score。本研究揭示了工具使用智能体的新型风险类别,分析了泄露成因,并提供了切实可行的缓解策略。