Large language model (LLM) based agents are increasingly used to automate financial transactions, yet their reliance on contextual reasoning exposes payment systems to prompt-driven manipulation. The Agent Payments Protocol (AP2) aims to secure agent-led purchases through cryptographically verifiable mandates, but its practical robustness remains underexplored. In this work, we perform an AI red-teaming evaluation of AP2 and identify vulnerabilities arising from indirect and direct prompt injection. We introduce two attack techniques, the Branded Whisper Attack and the Vault Whisper Attack which manipulate product ranking and extract sensitive user data. Using a functional AP2 based shopping agent built with Gemini-2.5-Flash and the Google ADK framework, we experimentally validate that simple adversarial prompts can reliably subvert agent behavior. Our findings reveal critical weaknesses in current agentic payment architectures and highlight the need for stronger isolation and defensive safeguards in LLM-mediated financial systems.
翻译:基于大型语言模型(LLM)的智能体正日益广泛地用于自动化金融交易,然而其对上下文推理的依赖使得支付系统暴露于提示驱动的操纵风险之下。代理支付协议(AP2)旨在通过密码学可验证的授权来保障智能体主导的购买行为,但其实际鲁棒性仍有待深入探究。本研究对AP2进行了人工智能红队测试评估,并识别出由间接和直接提示注入引发的安全漏洞。我们提出了两种攻击技术——品牌窃语攻击与金库窃语攻击,它们分别能操纵产品排名并窃取敏感用户数据。通过使用基于Gemini-2.5-Flash和谷歌ADK框架构建的功能性AP2购物智能体,我们通过实验验证了简单的对抗性提示能够可靠地颠覆智能体行为。我们的研究揭示了当前代理支付架构中的关键缺陷,并强调了在LLM介导的金融系统中加强隔离与防御性保障措施的迫切需求。