AI is moving from domain-specific autonomy in closed, predictable settings to large-language-model-driven agents that plan and act in open, cross-organizational environments. As a result, the cybersecurity risk landscape is changing in fundamental ways. Agentic AI systems can plan, act, collaborate, and persist over time, functioning as participants in complex socio-technical ecosystems rather than as isolated software components. Although recent work has strengthened defenses against model and pipeline level vulnerabilities such as prompt injection, data poisoning, and tool misuse, these system centric approaches may fail to capture risks that arise from autonomy, interaction, and emergent behavior. This article introduces the 4C Framework for multi-agent AI security, inspired by societal governance. It organizes agentic risks across four interdependent dimensions: Core (system, infrastructure, and environmental integrity), Connection (communication, coordination, and trust), Cognition (belief, goal, and reasoning integrity), and Compliance (ethical, legal, and institutional governance). By shifting AI security from a narrow focus on system-centric protection to the broader preservation of behavioral integrity and intent, the framework complements existing AI security strategies and offers a principled foundation for building agentic AI systems that are trustworthy, governable, and aligned with human values.
翻译:人工智能正从封闭可预测环境中的领域特定自主性,向开放跨组织环境中基于大语言模型的智能体转变,这些智能体能够规划行动并持续运作。因此,网络安全风险格局正在发生根本性变革。智能体AI系统具备长期规划、行动、协作与持续运行的能力,其角色已从孤立的软件组件转变为复杂社会技术生态系统中的参与主体。尽管近期研究在防御提示注入、数据投毒及工具误用等模型与管道层面漏洞方面取得了进展,但这些以系统为中心的方法可能无法有效应对由自主性、交互行为与涌现特性引发的风险。本文受社会治理机制启发,提出面向多智能体AI安全的4C框架。该框架将智能体风险归纳为四个相互依存的维度:核心层(系统、基础设施与环境完整性)、连接层(通信、协调与信任机制)、认知层(信念、目标与推理完整性)以及合规层(伦理、法律与制度治理)。通过将AI安全关注点从狭隘的系统中心防护转向更广泛的行为完整性与意图维护,本框架既是对现有AI安全策略的补充,也为构建可信、可控且符合人类价值观的智能体AI系统提供了原则性基础。