Large Language Model (LLM) agents increasingly rely on third-party skills that operate within privileged execution environments and routinely handle sensitive credentials, yet how these credentials are leaked remains largely unexplored. To fill this gap, we present the first large-scale empirical study on credential leakage in agent skills. From 170,226 artifacts on SkillsMP, the largest open-source skill marketplace, we sampled 17,022 skills via stratified random sampling and analyzed each through static secret extraction (regex and AST parsing), dynamic sandbox testing with mock credentials, and cross-referencing developer intent against runtime behavior. Our analysis identifies 520 affected skills containing 1,708 security issues, and yields a taxonomy of 10 leakage patterns. Three findings stand out. First, 76.3% of cases require jointly analyzing natural-language descriptions and programming logic, showing that credential exposure in skills is fundamentally cross-modal. Second, debug logging accounts for 73.5% of vulnerabilities because agent frameworks feed stdout into the LLM context window, turning routine debugging into a credential exposure vector. Third, 89.6% of leaked credentials are immediately exploitable -- 92.5% during routine execution without elevated privileges -- and the fork-based distribution model defeats remediation, as secrets removed from 107 upstream repositories persist across 50+ independent forks. Following responsible disclosure, all malicious skills have been removed and 91.6% of hardcoded cases remediated. We release our dataset, taxonomy, and detection pipeline to support future agent security research.
翻译:暂无翻译