Artificial Intelligence is moving from models that only generate text to Agentic AI, where systems behave as autonomous entities that can perceive, reason, plan, and act. Large Language Models (LLMs) are no longer used only as passive knowledge engines but as cognitive controllers that combine memory, tool use, and feedback from their environment to pursue extended goals. This shift already supports the automation of complex workflows in software engineering, scientific discovery, and web navigation, yet the variety of emerging designs, from simple single loop agents to hierarchical multi agent systems, makes the landscape hard to navigate. In this paper, we investigate architectures and propose a unified taxonomy that breaks agents into Perception, Brain, Planning, Action, Tool Use, and Collaboration. We use this lens to describe the move from linear reasoning procedures to native inference time reasoning models, and the transition from fixed API calls to open standards like the Model Context Protocol (MCP) and Native Computer Use. We also group the environments in which these agents operate, including digital operating systems, embodied robotics, and other specialized domains, and we review current evaluation practices. Finally, we highlight open challenges, such as hallucination in action, infinite loops, and prompt injection, and outline future research directions toward more robust and reliable autonomous systems.
翻译:人工智能正从仅生成文本的模型转向智能体人工智能,即系统作为能够感知、推理、规划与行动的自主实体运行。大型语言模型(LLMs)不再仅被用作被动的知识引擎,而是作为结合记忆、工具使用及环境反馈以实现长期目标的认知控制器。这一转变已支持软件工程、科学发现与网络导航中复杂工作流的自动化,然而从简单单循环智能体到分层多智能体系统等新兴设计的多样性,使得该领域难以系统把握。本文通过研究架构并提出统一分类法,将智能体分解为感知、大脑、规划、行动、工具使用与协作模块。我们以此视角阐述从线性推理过程到原生推理时推理模型的演进,以及从固定API调用向模型上下文协议(MCP)与原生计算机使用等开放标准的转变。同时,我们对智能体运行环境进行分类,涵盖数字操作系统、具身机器人及其他专业领域,并综述当前评估实践。最后,我们指出开放性挑战,如行动幻觉、无限循环与提示注入等问题,并展望未来研究方向,以构建更稳健可靠的自主系统。