"Do Not Mention This to the User": Detecting and Understanding Malicious Agent Skills

LLM-based coding agents increasingly rely on third-party extensions called skills, which bundle natural language instructions and helper scripts that execute with full user privileges. Community registries have emerged to distribute these skills, but the security implications remain unstudied due to the absence of labeled threat data. This paper presents a systematic security analysis of 98,380 skills collected from two major registries. Through a combination of static pattern matching and dynamic behavioral verification, we identify 157 skills exhibiting confirmed malicious behavior, encompassing 632 distinct vulnerabilities across 13 attack techniques. Our analysis reveals that these threats are deliberate rather than accidental: each malicious skill contains an average of 4.03 vulnerabilities spanning multiple attack phases. We identify two dominant attack strategies with statistically significant negative correlation -- credential theft via remote code execution, and agent manipulation through adversarial instructions embedded in documentation. Over half of all confirmed cases originate from a single threat actor employing templated brand impersonation at scale. We further observe that attack sophistication correlates with concealment investment, with advanced skills universally employing undocumented capabilities while also exploiting platform-native trust mechanisms. Following responsible disclosure, registry maintainers removed all 157 (100%) of the reported skills. Our dataset and detection pipeline are publicly available to facilitate future research on securing LLM agent ecosystems.

翻译：基于大语言模型的编码代理日益依赖于称为“技能”的第三方扩展，这些技能将自然语言指令与以完整用户权限执行的辅助脚本捆绑在一起。社区注册中心已涌现以分发这些技能，但缺乏标注威胁数据使得其安全影响尚未得到研究。本文对从两个主要注册中心收集的98,380个技能进行了系统性安全分析。通过静态模式匹配与动态行为验证相结合，我们识别出157个表现出已确认恶意行为的技能，涵盖13种攻击技术中的632个不同漏洞。分析表明这些威胁是蓄意而非偶然：每个恶意技能平均包含4.03个跨多个攻击阶段的漏洞。我们识别出两种具有统计显著负相关性的主导攻击策略——通过远程代码执行窃取凭证，以及通过嵌入在文档中的对抗性指令操纵代理。超过一半的已确认案例源自一个采用大规模模板化品牌冒充的单一威胁行为者。我们进一步观察到攻击复杂度与隐藏投资相关，高级技能普遍采用未文档化功能，同时利用平台原生信任机制。经负责任的披露后，注册中心维护者已移除所有157个（100%）被报告技能。我们的数据集与检测流程公开可用，以促进未来关于保障LLM代理生态系统安全的研究。