LLM agents rely heavily on high-quality trajectory data to guide their problem-solving behaviors, yet producing such data requires substantial task design, high-capacity model generation, and manual filtering. Despite the high cost of creating these datasets, existing literature has overlooked copyright protection for LLM agent trajectories. This gap leaves creators vulnerable to data theft and makes it difficult to trace misuse or enforce ownership rights. This paper introduces ActHook, the first watermarking method tailored for agent trajectory datasets. Inspired by hook mechanisms in software engineering, ActHook embeds hook actions that are activated by a secret input key and do not alter the original task outcome. Like software execution, LLM agents operate sequentially, allowing hook actions to be inserted at decision points without disrupting task flow. When the activation key is present, an LLM agent trained on watermarked trajectories can produce these hook actions at a significantly higher rate, enabling reliable black-box detection. Experiments on mathematical reasoning, web searching, and software engineering agents show that ActHook achieves an average detection AUC of 94.3 on Qwen-2.5-Coder-7B while incurring negligible performance degradation.
翻译:大语言模型智能体高度依赖高质量轨迹数据来指导其问题求解行为,然而生成此类数据需要大量任务设计、高容量模型生成和人工筛选。尽管创建这些数据集的成本高昂,现有文献却忽视了对大语言模型智能体轨迹的版权保护。这一空白使得创建者容易遭受数据窃取,且难以追踪滥用行为或维护所有权。本文提出ActHook——首个专为智能体轨迹数据集设计的水印方法。受软件工程中钩子机制启发,ActHook通过秘密输入密钥激活嵌入的钩子动作,且不改变原始任务结果。如同软件执行过程,大语言模型智能体按序运行,使得钩子动作可在决策点插入而不干扰任务流程。当激活密钥存在时,基于水印轨迹训练的大语言模型智能体能以显著更高的概率生成这些钩子动作,从而实现可靠的黑盒检测。在数学推理、网络搜索和软件工程智能体上的实验表明,ActHook在Qwen-2.5-Coder-7B上平均检测AUC达到94.3,同时引起的性能下降可忽略不计。