Today's AI agents are built on large language models (LLMs) equipped with tools to access and modify external environments, such as corporate file systems, API-accessible platforms and websites. AI agents offer the promise of automating computer-based tasks across the economy. However, developers, researchers and governments lack an understanding of how AI agents are currently being used, and for what kinds of (consequential) tasks. To address this gap, we evaluated 177,436 agent tools created from 11/2024 to 02/2026 by monitoring public Model Context Protocol (MCP) server repositories, the current predominant standard for agent tools. We categorise tools according to their direct impact: perception tools to access and read data, reasoning tools to analyse data or concepts, and action tools to directly modify external environments, like file editing, sending emails or steering drones in the physical world. We use O*NET mapping to identify each tool's task domain and consequentiality. Software development accounts for 67% of all agent tools, and 90% of MCP server downloads. Notably, the share of 'action' tools rose from 27% to 65% of total usage over the 16-month period sampled. While most action tools support medium-stakes tasks like editing files, there are action tools for higher-stakes tasks like financial transactions. Using agentic financial transactions as an example, we demonstrate how governments and regulators can use this monitoring method to extend oversight beyond model outputs to the tool layer to monitor risks of agent deployment.
翻译:当今的AI代理建立在配备工具的大型语言模型(LLM)之上,这些工具用于访问和修改外部环境,例如企业文件系统、可访问API的平台和网站。AI代理有望实现整个经济领域中基于计算机任务的自动化。然而,开发者、研究人员和政府对于AI代理当前的使用方式以及它们被用于何种(具有重要影响的)任务缺乏了解。为弥补这一空白,我们通过监控公共模型上下文协议(MCP)服务器仓库(当前代理工具的主流标准),评估了2024年11月至2026年2月期间创建的177,436个代理工具。我们根据工具的直接影响力对其进行分类:感知工具用于访问和读取数据,推理工具用于分析数据或概念,以及行动工具用于直接修改外部环境,例如文件编辑、发送电子邮件或在物理世界中操控无人机。我们使用O*NET映射来识别每个工具的任务领域及其重要性。软件开发占所有代理工具的67%,占MCP服务器下载量的90%。值得注意的是,在采样的16个月期间,'行动'工具的使用占比从27%上升至65%。虽然大多数行动工具支持中等风险任务(如编辑文件),但也存在用于高风险任务(如金融交易)的行动工具。以智能金融交易为例,我们展示了政府和监管机构如何利用这种监控方法,将监管范围从模型输出层扩展到工具层,以监控代理部署的风险。