LLM-based chatbot agents increasingly process user requests by combining natural-language reasoning with external tools such as web browsing. These capabilities improve usability, but they also create attack surfaces when untrusted external content is processed as part of a user' s task. This paper studies a privacy-leakage attack chain based on indirect prompt injection in black-box chatbot environments, where the attacker has no access to model weights, system prompts, or agent implementation details including how a trajectory is actually managed during its processing for a query. We first analyze how an attacker can hijack an agent' s intended task by crafting external content that appears benign to the victim while inducing the agent to execute an attacker-defined objective. We then evaluate a new prompt-injection technique, called exemplification, which uses a bridge in the external content to reframe the user prompt and the benign beginning of the retrieved page as few-shot examples before appending the attacker' s objective. We compare its attack success rate with a prior fake-completion technique. Finally, we demonstrate a proof-of-concept data-exfiltration chain using fictitious personal information in a controlled setting. Our results suggest that prompt injection, jailbreak-style instruction steering, and web-tool invocation can be combined into a feasible privacy-leakage path in deployed chatbot agents.
翻译:基于大语言模型的聊天机器人代理通过将自然语言推理与网络浏览等外部工具相结合来处理用户请求,这提升了可用性,但当不可信的外部内容作为用户任务的一部分被处理时,也引入了攻击面。本文研究黑盒聊天机器人环境中基于间接提示注入的隐私泄露攻击链,在此场景下攻击者无法获取模型权重、系统提示或代理实现细节(包括查询处理过程中轨迹的实际管理方式)。我们首先分析攻击者如何通过构造对受害者看似无害的外部内容劫持代理的预期任务,诱导其执行攻击者定义的目标。随后评估一种名为“示例化”的新型提示注入技术——该技术利用外部内容中的桥接机制将用户提示与检索页面的良性开篇重构为少样本示例,再附加攻击者目标。我们将其攻击成功率与此前的伪补全技术进行对比。最后,在可控环境中利用虚构个人信息展示了概念验证级的数据泄露链。研究结果表明,提示注入、越狱式指令引导与网络工具调用可组合成部署型聊天机器人代理中可行的隐私泄露路径。