Large-language-model (LLM)-based AI agents have recently showcased impressive versatility by employing dynamic reasoning, an adaptive, multi-step process that coordinates with external tools. This shift from static, single-turn inference to agentic, multi-turn workflows broadens task generalization and behavioral flexibility, but it also introduces serious concerns about system-level cost, efficiency, and sustainability. This paper presents the first comprehensive system-level analysis of AI agents, quantifying their resource usage, latency behavior, energy consumption, and datacenter-wide power consumption demands across diverse agent designs and test-time scaling strategies. We further characterize how AI agent design choices, such as few-shot prompting, reflection depth, and parallel reasoning, impact accuracy-cost tradeoffs. Our findings reveal that while agents improve accuracy with increased compute, they suffer from rapidly diminishing returns, widening latency variance, and unsustainable infrastructure costs. Through detailed evaluation of representative agents, we highlight the profound computational demands introduced by AI agent workflows, uncovering a looming sustainability crisis. These results call for a paradigm shift in agent design toward compute-efficient reasoning, balancing performance with deployability under real-world constraints.
翻译:基于大语言模型(LLM)的AI智能体近期通过采用动态推理——一种与外部工具协同的自适应多步骤过程——展现出令人瞩目的多功能性。这种从静态单轮推理向智能体化多轮工作流的转变,虽然拓宽了任务泛化能力和行为灵活性,但也引发了关于系统级成本、效率和可持续性的严重关切。本文首次对AI智能体进行了全面的系统级分析,量化了不同智能体设计和测试时扩展策略下的资源使用、延迟特性、能耗及数据中心整体功耗需求。我们进一步揭示了AI智能体设计选择(如少样本提示、反思深度和并行推理)如何影响精度-成本权衡。研究发现表明,虽然智能体通过增加计算量提升了精度,但其收益迅速递减,延迟方差扩大,且基础设施成本不可持续。通过对代表性智能体的详细评估,我们凸显了AI智能体工作流带来的巨大计算需求,揭示了一场迫在眉睫的可持续性危机。这些结果呼吁智能体设计范式向计算高效推理转变,在现实约束下平衡性能与可部署性。