Finance LLM agents must simultaneously block prompt-induced unauthorized actions and approve legitimate multi-step business workflows. However, boundary filters often miss irreversible mid-trajectory tool calls, while post-hoc LLM judges perform auditing only after termination -- too late for intervention and at a computational cost that scales linearly with trace length. We present FinHarness, an inline safety harness that wraps a finance agent end-to-end with three components: a Query Monitor that fuses single-turn intent with cross-turn drift, a Tool Monitor that evaluates each prospective tool call, and a Cascade module that integrates per-step risk and adaptively routes verification between a lightweight and an advanced-tier LLM judge. Fired risk factors are re-injected into the agent input as ex-ante evidence, enabling the agent to refuse, re-plan, or approve on its own. On FinVault, routed FinHarness cuts ASR from 38.3% to 15.0% while largely preserving benign approval ($41.1\% \to 39.3\%$), and uses $4.7\times$ fewer advanced-judge calls than an always-advanced ablation.
翻译:金融大语言模型代理需同时拦截即时提示引发的越权行为,并批准合法的多步骤业务工作流。然而,边界过滤器常遗漏不可逆的中间轨迹工具调用,而事后大语言模型裁判仅能在终止后执行审计——此时已无法干预,且其计算成本随轨迹长度线性增长。我们提出FinHarness——一种内联安全护栏,通过三个组件端到端封装金融代理:查询监控器融合单轮意图与跨轮漂移,工具监控器评估每次预期工具调用,以及级联模块集成每步风险并自适应地在轻量级与高级大语言模型裁判之间路由验证。触发的风险因子作为事前证据重新注入代理输入,使其可自行拒绝、重新规划或批准。在FinVault上,路由式FinHarness将攻击成功率从38.3%降至15.0%,同时基本保留良性批准率(41.1%→39.3%),且高级裁判调用次数比始终调用高级裁判的消融方法减少4.7倍。