Agent skills extend local AI agents, such as Claude Code or Open Claw, with additional functionality, and their popularity has led to the emergence of dedicated skill marketplaces, similar to app stores for mobile applications. Simultaneously, automated skill scanners were introduced, analyzing the skill description available in SKILL.md, to verify their benign behavior. The results for individual market places mark up to 46.8% of skills as malicious. In this paper, we present the largest empirical security analysis of the AI agent skill ecosystem, questioning this high classification of malicious skills. Therefore, we collect 238,180 unique skills from three major distribution platforms and GitHub to systematically analyze their type and behavior. This approach substantially reduces the number of skills flagged as non-benign by security scanners to only 0.52% which remain in malicious flagged repositories. Consequently, out methodology substantially reduces false positives and provides a more robust view of the ecosystem's current risk surface. Beyond that, we extend the security analysis from the mere investigation of the skill description to a comparison of its congruence with the GitHub repository the skill is embedded in, providing additional context. Furthermore, our analysis also uncovers several, by now undocumented real-world attack vectors, namely hijacking skills hosted on abandoned GitHub repositories.
翻译:智能体技能为本地AI智能体(如Claude Code或Open Claw)扩展了附加功能,其流行催生了专用技能市场的出现,类似于移动应用程序的应用商店。与此同时,自动化技能扫描器被引入,通过分析SKILL.md中提供的技能描述来验证其良性行为。个别市场的扫描结果显示高达46.8%的技能被标记为恶意。本文提出了对AI智能体技能生态系统的最大规模实证安全分析,对这一高比例的恶意技能分类提出质疑。为此,我们从三大分发平台及GitHub收集了238,180个独立技能,系统性地分析其类型与行为。该方法将安全扫描器标记为非良性的技能数量大幅降低至仅0.52%,这些技能仍存留于被标记为恶意的代码仓库中。因此,我们的方法显著减少了误报,并为生态系统当前的风险态势提供了更稳健的评估视角。此外,我们将安全分析从单纯考察技能描述,扩展到比较技能描述与其所在GitHub仓库内容的一致性,从而提供额外上下文。进一步地,我们的分析还揭示了若干迄今未公开记录的真实世界攻击向量,即劫持托管于废弃GitHub仓库的技能。