Artificial Intelligence (AI) agents have evolved from passive predictive tools into active entities capable of autonomous decision-making and environmental interaction, driven by the reasoning capabilities of Large Language Models (LLMs). However, this evolution has introduced critical security vulnerabilities that existing frameworks fail to address. The Hierarchical Autonomy Evolution (HAE) framework organizes agent security into three tiers: Cognitive Autonomy (L1) targets internal reasoning integrity; Execution Autonomy (L2) covers tool-mediated environmental interaction; Collective Autonomy (L3) addresses systemic risks in multi-agent ecosystems. We present a taxonomy of threats spanning cognitive manipulation, physical environment disruption, and multi-agent systemic failures, and evaluate existing defenses while identifying key research gaps. The findings aim to guide the development of multilayered, autonomy-aware defense architectures for trustworthy AI agent systems.
翻译:人工智能(AI)体已从被动预测工具演化为具备自主决策与环境交互能力的主动实体,其驱动力源于大型语言模型(LLMs)的推理能力。然而,这一进化过程引入了现有框架无法解决的关键安全漏洞。层级自主进化(HAE)框架将AI体安全分为三个层级:认知自主(L1)专注于内部推理完整性;执行自主(L2)涵盖工具介导的环境交互;集体自主(L3)应对多智能体生态系统中的系统性风险。我们提出了一种威胁分类体系,涵盖认知操纵、物理环境破坏及多智能体系统性故障,并对现有防御措施进行评估,同时识别关键研究空白。研究结果旨在引导构建面向可信AI体系统的多层次、自主感知防御架构。