LLM-based agents are increasingly deployed to autonomously solve complex tasks, raising urgent needs for IP protection and regulatory provenance. While content watermarking effectively attributes LLM-generated outputs, it fails to directly identify the high-level planning behaviors (e.g., tool and subgoal choices) that govern multi-step execution. Critically, watermarking at the planning-behavior layer faces unique challenges: minor distributional deviations in decision-making can compound during long-term agent operation, degrading utility, and many agents operate as black boxes that are difficult to intervene in directly. To bridge this gap, we propose AgentMark, a behavioral watermarking framework that embeds multi-bit identifiers into planning decisions while preserving utility. It operates by eliciting an explicit behavior distribution from the agent and applying distribution-preserving conditional sampling, enabling deployment under black-box APIs while remaining compatible with action-layer content watermarking. Experiments across embodied, tool-use, and social environments demonstrate practical multi-bit capacity, robust recovery from partial logs, and utility preservation. The code is available at https://github.com/Tooooa/AgentMark.
翻译:基于大语言模型的智能体正越来越多地被部署用于自主解决复杂任务,这使得知识产权保护与监管溯源的需求日益迫切。虽然内容水印能有效溯源大语言模型的生成输出,但无法直接识别管控多步执行过程的高层规划行为(例如工具选择与子目标决策)。关键在于,在规划行为层添加水印面临独特挑战:决策过程中的微小分布偏差会在长期智能体运行中持续累积,导致效用下降,且许多智能体作为黑箱系统难以直接干预。为弥合这一鸿沟,我们提出AgentMark——一种面向行为的行水印框架,该框架能在保持效用的同时,将多位标识嵌入规划决策中。其运行机制为:从智能体获取显式行为分布,并采用保持分布的采样条件方法,从而支持黑箱API环境下的部署,同时保持与动作层内容水印的兼容性。在具身智能、工具调用及社交环境中的实验证明,该方法具备实用的多比特容量、对部分日志的稳健恢复能力及效用保持特性。代码开源地址:https://github.com/Tooooa/AgentMark。