"Do Not Mention This to the User": Detecting and Understanding Malicious Agent Skills in the Wild

LLM-based coding agents increasingly rely on third-party extensions called skills, which bundle natural language instructions and helper scripts that execute with full user privileges. Community registries have emerged to distribute these skills, but the security implications remain unstudied due to the absence of labeled threat data. This paper presents a systematic security analysis of 98,380 skills collected from two major registries. Through a combination of static pattern matching and dynamic behavioral verification, we identify 157 skills exhibiting confirmed malicious behavior, encompassing 632 distinct vulnerabilities across 13 attack techniques. Our analysis reveals that these threats are deliberate rather than accidental: each malicious skill contains an average of 4.03 vulnerabilities spanning multiple attack phases. We identify two dominant attack strategies with statistically significant negative correlation -- credential theft via remote code execution, and agent manipulation through adversarial instructions embedded in documentation. Over half of all confirmed cases originate from a single threat actor employing templated brand impersonation at scale. We further observe that attack sophistication correlates with concealment investment, with advanced skills universally employing undocumented capabilities while also exploiting platform-native trust mechanisms. Following responsible disclosure, registry maintainers removed all 157 (100%) of the reported skills. Our dataset and detection pipeline are publicly available to facilitate future research on securing LLM agent ecosystems.

翻译：基于大语言模型的编码代理越来越依赖称为“技能”的第三方扩展，这些技能将自然语言指令与拥有完整用户权限的帮助脚本捆绑在一起。社区注册中心应运而生，用于分发这些技能，但由于缺乏标注威胁数据，其安全影响尚未得到研究。本文对从两大主要注册中心收集的98,380个技能进行了系统性安全分析。通过静态模式匹配与动态行为验证相结合，我们识别出157个表现出确认恶意行为的技能，涵盖13种攻击技术中的632个不同漏洞。我们的分析表明，这些威胁是蓄意而非偶然的：每个恶意技能平均包含4.03个跨越多个攻击阶段的漏洞。我们识别出两种具有统计显著负相关性的主要攻击策略——通过远程代码执行窃取凭证，以及通过嵌入文档中的对抗性指令操纵代理。超过半数已确认案例源自一个采用模板化品牌冒充进行规模化攻击的单一威胁行为者。我们进一步观察到，攻击复杂性与隐藏投入呈正相关，高级技能普遍使用未记录功能，同时利用平台原生的信任机制。经过负责任的披露，注册中心维护者已移除所有157个（100%）被报告的技能。我们的数据集和检测流程已公开，以促进未来对保护大语言模型代理生态系统安全的研究。