With the wide adoption of personal AI assistants such as OpenClaw, privacy leakage in user interaction contexts with large language model (LLM) agents has become a critical issue. Existing privacy attacks against LLMs primarily target training data, while research on inference-time contextual privacy risks in LLM agent memory remains limited. Moreover, prior methods often incur high attack costs, requiring multiple queries or relying on white-box assumptions, which limits their practicality in real-world deployments. To address these issues, we propose a training-free privacy extraction attack targeting LLM agent memory, which we name \textsc{Spore}. \textsc{Spore} is compatible with both black-box and gray-box settings. In the black-box setting, \textsc{Spore} can efficiently extract a small candidate set via a single query to recover the original private information. In the gray-box setting, \textsc{Spore} allows the attacker to leverage multi-ranked tokens for more accurate and faster privacy extraction. We provide an information-theoretic analysis of \textsc{Spore} and show that it achieves high query efficiency with substantial per query information leakage. Experiments on multiple frontier LLMs show that \textsc{Spore} outperforms attack success rate over existing state-of-the-art (SOTA) schemes. It also maintains low attack cost and remains stable across different model parameter settings. We further evaluate the robustness of \textsc{Spore} against existing defense mechanisms. Our results show that \textsc{Spore} consistently bypasses both detection and strong safety alignment, demonstrating resilient performance in diverse defensive settings and real-world safety threats.
翻译:随着如OpenClaw等个人AI助手的广泛采用,在与大语言模型智能体交互过程中产生的隐私泄露问题已成为关键挑战。现有针对大语言模型的隐私攻击主要聚焦训练数据,而针对推理阶段LLM智能体记忆中的上下文隐私风险的研究仍十分有限。此外,现有方法往往攻击成本高昂,需多次查询或依赖白盒假设,限制了其在现实部署中的实用性。为解决上述问题,我们提出一种面向LLM智能体记忆的无训练隐私提取攻击方法,命名为 \textsc{Spore}。\textsc{Spore} 同时兼容黑盒与灰盒设置。在黑盒场景下,\textsc{Spore} 可通过单次查询高效提取小型候选集以恢复原始隐私信息;在灰盒场景下,\textsc{Spore} 允许攻击者利用多排名令牌实现更精准快速的隐私提取。我们从信息论角度对 \textsc{Spore} 进行了分析,证明其能在每次查询中实现显著的信息泄漏,并具有极高的查询效率。在多个前沿大语言模型上的实验表明,\textsc{Spore} 的攻击成功率优于现有最优方案,同时保持较低攻击成本,且在不同模型参数设置下具有稳定性。我们进一步评估了 \textsc{Spore} 对现有防御机制的鲁棒性。结果显示,\textsc{Spore} 能持续绕过检测与强安全对齐机制,在不同防御设置和现实安全威胁场景中展现出稳健性能。