This paper advances a methodological proposal for safety research in agentic AI. As systems acquire planning, memory, tool use, persistent identity, and sustained interaction, safety can no longer be analysed primarily at the level of the isolated model. Population-level risks arise from structured interaction among agents, through processes of communication, observation, and mutual influence that shape collective behaviour over time. As the object of analysis shifts, a methodological gap emerges. Approaches focused either on single agents or on aggregate outcomes do not identify the interaction-level mechanisms that generate collective risks or the design variables that control them. A framework is required that links local interaction structure to population-level dynamics in a causally explicit way, allowing both explanation and intervention. We introduce two linked concepts. Agentic microphysics defines the level of analysis: local interaction dynamics where one agent's output becomes another's input under specific protocol conditions. Generative safety defines the methodology: growing phenomena and elicit risks from micro-level conditions to identify sufficient mechanisms, detect thresholds, and design effective interventions.
翻译:本文提出了一种面向具身智能体(agentic AI)安全研究的方法论框架。当系统具备规划、记忆、工具使用、持久身份标识和持续交互能力时,安全分析已不能再局限于孤立模型层面。群体层面的风险源于智能体之间的结构化交互——通过通信、观察和相互影响等过程,这些交互随时间的推移塑造着集体行为。随着分析对象的转变,方法论缺口随之显现。聚焦于单个智能体或聚合结果的方法,无法识别产生集体风险的交互层面机制,也无法确定控制这些机制的设计变量。我们需要一种能够以因果显式方式将局部交互结构与群体层面动态联系起来的框架,从而同时实现解释与干预。我们引入两个相互关联的概念:"Agentic微物理学"界定了分析层面,即特定协议条件下一个智能体的输出成为另一个智能体输入的局部交互动力学;"生成式安全"则界定了方法论,即从微观条件中涌现现象并引出风险,以识别充分机制、检测阈值并设计有效的干预措施。