Users rely on execution traces to observe agent behavior, diagnose failures, and ensure accountability. These traces contain rich procedural detail, including tool invocations, intermediate decisions, and error-recovery logic. Yet this detail can expose private procedural skills, allowing downstream methods to recover key formulas, thresholds, and strategies without access to model weights or skill files. To quantify this risk and evaluate protection, we construct CapTraceBench, a benchmark of 75 specialized long-horizon tasks and 154 curated skills across seven domains. We also introduce RedAct, a protected trace release framework that localizes protected key information, rewrites traces while preserving verifier-critical evidence, and embeds behavioral watermarks for downstream provenance analysis. Across representative trace reuse methods, RedAct reduces normalized skill transfer (NST) from 44.7-67.1% on raw traces to below the no-skill baseline, while preserving audit evidence. Its standalone behavioral watermarks reach 93.6-100.0% true detection with a false alarm rate of at most 1.9%. These results frame public agent traces as security interfaces and show that selective redaction can reduce procedural capability leakage without removing audit evidence.
翻译:用户依赖执行痕迹来观察智能体行为、诊断故障并确保可问责性。这些痕迹包含丰富的程序化细节,包括工具调用、中间决策及错误恢复逻辑。然而此类细节会暴露私有的程序化技能,使下游方法能够在无需访问模型权重或技能文件的情况下恢复关键公式、阈值与策略。为量化该风险并评估保护措施,我们构建了CapTraceBench基准测试集,涵盖7个领域的75项专业化长时域任务与154个精选技能。同时提出RedAct框架——一种受保护痕迹发布方案,可定位受保护的关键信息、在保留验证器关键证据的前提下重写痕迹,并嵌入行为水印用于下游溯源分析。在具有代表性的痕迹复用方法中,RedAct将原始痕迹上的标准化技能迁移率(NST)从44.7-67.1%降至低于无技能基线水平,同时保留审计证据。其独立行为水印在最高1.9%虚警率下达到93.6-100.0%的真实检测率。这些结果将公开的智能体痕迹定义为安全接口,并表明选择性编辑能在不删除审计证据的前提下减少程序化能力泄露。