LLM-based agents are increasingly deployed to autonomously solve complex tasks, raising urgent needs for IP protection and regulatory provenance. While content watermarking effectively attributes LLM-generated outputs, it fails to directly identify the high-level planning behaviors (e.g., tool and subgoal choices) that govern multi-step execution. Critically, watermarking at the planning-behavior layer faces unique challenges: minor distributional deviations in decision-making can compound during long-term agent operation, degrading utility, and many agents operate as black boxes that are difficult to intervene in directly. To bridge this gap, we propose AgentMark, a behavioral watermarking framework that embeds multi-bit identifiers into planning decisions while preserving utility. It operates by eliciting an explicit behavior distribution from the agent and applying distribution-preserving conditional sampling, enabling deployment under black-box APIs while remaining compatible with action-layer content watermarking. Experiments across embodied, tool-use, and social environments demonstrate practical multi-bit capacity, robust recovery from partial logs, and utility preservation. The code is available at https://github.com/Tooooa/AgentMark.
翻译:基于大语言模型(LLM)的智能体正日益广泛地部署于自主解决复杂任务,这引发了对知识产权保护和监管溯源的迫切需求。虽然内容水印技术能有效归属LLM生成的输出,但它无法直接识别控制多步执行的高层规划行为(例如工具与子目标选择)。关键在于,在规划行为层进行水印面临独特挑战:决策过程中微小的分布偏差可能在智能体长期运行中累积,从而降低效用;且许多智能体以黑盒形式运行,难以直接干预。为弥合这一差距,我们提出了AgentMark,一种将多比特标识符嵌入规划决策同时保持效用的行为水印框架。其工作原理是通过从智能体引出显式行为分布,并应用保持分布的条件采样,从而实现在黑盒API下的部署,同时保持与动作层内容水印的兼容性。在具身、工具使用及社交环境中的实验证明了其具备实用的多比特容量、对部分日志的鲁棒恢复能力以及效用保持特性。代码发布于 https://github.com/Tooooa/AgentMark。