Evaluating Tool Cloning in Agentic-AI Ecosystems

Agent tools are becoming a core interface through which LLM agents access external data, services, and execution environments. As these tools are distributed through public marketplaces, raw tool counts may substantially overstate ecosystem diversity if many repositories are cloned, lightly modified, or derived from shared templates. Such hidden duplication can contaminate benchmark splits, propagate vulnerable implementations, bias measurements of tool-use generalization, and raise provenance, attribution, and intellectual-property concerns. We present, to our knowledge, the first large-scale measurement study of tool cloning in agentic AI ecosystems. We curate a unified dataset from multiple public platforms, covering 7,508 Model Context Protocol (MCP) repositories with 87,564 extracted tools and 1,353 Skills repositories with 12,447 tools, for a total of 8,861 repositories and 100,011 tool entries. To measure implementation-level duplication, we build a repository-level auditing pipeline using complementary lexical and fuzzy-structural similarity metrics, and compute pairwise similarity across MCP-to-MCP, Skills-to-Skills, and MCP-to-Skills repository pairs. We further manually verify 100 sampled pairs per MCP and Skills ecosystem across similarity-score buckets to calibrate how often high similarity reflects true code cloning. Our analysis shows that cloning is not an isolated artifact: high-similarity regions appear across comparison settings, and 60\% of high-Jaccard candidates and 85\% of high-ssdeep candidates in the MCP ecosystem are manually verified as clones. These results indicate that tool cloning is a pervasive and severe source of hidden duplication in agent-tool ecosystems. They further suggest that agent-tool datasets and benchmarks should account for repository provenance and implementation similarity when measuring tool diversity or constructing evaluation splits.

翻译：智能体工具正成为LLM智能体访问外部数据、服务及执行环境的核心接口。随着这些工具通过公共市场分发，若大量仓库源自克隆、轻度修改或基于共享模板，原始工具数量可能显著高估生态系统多样性。此类隐蔽重复会污染基准测试划分、传播有漏洞的实现、偏差化工具泛化能力评估，并引发溯源、归属及知识产权问题。我们首次提出针对智能体AI生态中工具克隆的大规模测量研究。我们整合多个公共平台数据，构建统一数据集，涵盖7,508个MCP（模型上下文协议）仓库（含87,564个工具）及1,353个Skills仓库（含12,447个工具），总计8,861个仓库与100,011个工具条目。为测量实现级重复，我们构建仓库级审计流水线，采用互补的词汇相似度与模糊结构相似度指标，计算MCP-MCP、Skills-Skills及MCP-Skills仓库对间的成对相似度。进而对MCP与Skills生态中各100个按相似度得分布局的手工验证样本对进行校准，以判定高相似度反映真实代码克隆的频次。分析表明克隆并非孤立现象：跨比较场景均出现高相似度区域，MCP生态中60%的Jaccard高相似候选者与85%的ssdeep高相似候选者经手工验证确为克隆。结果表明工具克隆是智能体工具生态中隐蔽重复的普遍严重来源，进一步提示在评估工具多样性或构建评估划分时，应充分考虑仓库溯源与实现相似度。