As LLM-based agents increasingly browse the web on users' behalf, a natural question arises: can websites passively identify which underlying model powers an agent? Doing so would represent a significant security risk, enabling targeted attacks tailored to known model vulnerabilities. Across 14 frontier LLMs and four web environments spanning information retrieval and shopping tasks, we show that an agent's actions and interaction timings, captured via a passive JavaScript tracker, are sufficient to identify the underlying model with up to 96\% F1. We formalise this attack surface by demonstrating that classifiers trained on agent actions generalise across model sizes and families. We further show that strong classifiers can be trained from few interaction traces and that agent identity can be inferred early within an episode. Injecting randomised timing delays between actions substantially degrades classifier performance, but does not provide robust protection: a classifier retrained on delayed traces largely recovers performance. We release our harness and a labelled corpus of agent traces \href{https://github.com/KabakaWilliam/known_actions}{here}.
翻译:随着基于LLM的代理越来越多地代表用户浏览网页,一个自然的问题随之浮现:网站能否被动地识别驱动代理的底层模型?若能做到这一点,将构成重大安全风险,使得针对已知模型弱点的定向攻击成为可能。针对14个前沿LLM模型及涵盖信息检索和购物任务的四个网络环境,我们证明:通过被动JavaScript追踪器捕获的代理操作及其交互时序,足以识别底层模型,最高可达到96%的F1分数。我们通过展示基于代理操作训练的跨模型规模和系列的泛化分类器,形式化了这一攻击面。进一步研究表明,少量交互轨迹即可训练出强分类器,且代理身份可在单次会话早期被推断。在操作间注入随机时序延迟会大幅降低分类器性能,但无法提供稳健保护:基于延迟轨迹重新训练的分类器仍能基本恢复性能。我们已在\href{https://github.com/KabakaWilliam/known_actions}{此处}发布实验工具集及标注的代理轨迹语料库。