Your Agent Is Mine: Measuring Malicious Intermediary Attacks on the LLM Supply Chain

Large language model (LLM) agents increasingly rely on third-party API routers to dispatch tool-calling requests across multiple upstream providers. These routers operate as application-layer proxies with full plaintext access to every in-flight JSON payload, yet no provider enforces cryptographic integrity between client and upstream model. We present the first systematic study of this attack surface. We formalize a threat model for malicious LLM API routers and define two core attack classes, payload injection (AC-1) and secret exfiltration (AC-2), together with two adaptive evasion variants: dependency-targeted injection (AC-1.a) and conditional delivery (AC-1.b). Across 28 paid routers purchased from Taobao, Xianyu, and Shopify-hosted storefronts and 400 free routers collected from public communities, we find 1 paid and 8 free routers actively injecting malicious code, 2 deploying adaptive evasion triggers, 17 touching researcher-owned AWS canary credentials, and 1 draining ETH from a researcher-owned private key. Two poisoning studies further show that ostensibly benign routers can be pulled into the same attack surface: a leaked OpenAI key generates 100M GPT-5.4 tokens and more than seven Codex sessions, while weakly configured decoys yield 2B billed tokens, 99 credentials across 440 Codex sessions, and 401 sessions already running in autonomous YOLO mode. We build Mine, a research proxy that implements all four attack classes against four public agent frameworks, and use it to evaluate three deployable client-side defenses: a fail-closed policy gate, response-side anomaly screening, and append-only transparency logging.

翻译：大语言模型代理越来越多地依赖第三方API路由器将工具调用请求分发给多个上游提供商。这些路由器作为应用层代理运行，能够以明文形式完全访问每一帧JSON负载，然而没有提供商在客户端与上游模型之间强制执行密码学完整性。我们首次对这一攻击面进行了系统性研究。我们为恶意LLM API路由器形式化了一个威胁模型，并定义了两类核心攻击：负载注入（AC-1）与秘密泄露（AC-2），以及两种自适应规避变体：依赖目标注入（AC-1.a）与条件交付（AC-1.b）。通过对从淘宝、闲鱼和Shopify托管商店购买的28个付费路由器以及从公共社区收集的400个免费路由器进行测试，我们发现1个付费和8个免费路由器正在主动注入恶意代码，2个部署了自适应规避触发器，17个接触了研究者拥有的AWS蜜罐凭证，1个从研究者拥有的私钥中抽取了以太币。两项投毒研究进一步表明，表面良性的路由器也可能被卷入同一攻击面：一个泄露的OpenAI密钥生成了1亿个GPT-5.4令牌和超过7个Codex会话，而配置薄弱的蜜罐产生了20亿计费令牌、跨440个Codex会话的99个凭证，以及已以自主YOLO模式运行的401个会话。我们构建了Mine研究代理，该代理针对四个公开代理框架实现了全部四类攻击，并利用它评估了三种可部署的客户端防御：故障关闭策略门控、响应侧异常检测以及仅追加透明度日志。