Large language models in regulated financial workflows are governed by natural-language policies that the same model interprets, creating a principal--agent failure: outputs can appear compliant without being compliant. Existing evaluation measures task accuracy but not whether governance constrains behaviour at the decision rationale level -- where regulated decisions must be auditable. We introduce five governance metrics that quantify policy compliance at the rationale level and apply them in a synthetic banking domain to compare text-only governance against mechanical enforcement: four primitives operating outside the model's interpretive loop. Under text-only governance, 27% of deferrals carry no decision-relevant information. Mechanical enforcement reduces this rate by 73%, more than doubles deferral information content, and raises task accuracy from MCC~$0.43$ to $0.88$. The improvement is driven by architectural separation: LLM-generated rationales under mechanical enforcement show comparable CDL to text-only governance -- the gain comes from removing clear-cut decisions from the model's control. A causal ablation confirms that each primitive is individually necessary. Our central finding is a governance-task decoupling: under structural stress, text-only governance degrades on both dimensions simultaneously, whereas mechanical enforcement preserves governance quality even as task performance drops. This implies that governance and task evaluation are distinct axes: accuracy is not a sufficient proxy for governance in regulated AI systems.
翻译:受监管金融工作流中的大语言模型受制于其自身解释的自然语言策略,由此产生委托-代理失效问题:输出可能表面上合规而实际未合规。现有评估方法仅衡量任务准确性,却未评估治理措施是否在决策依据层面约束行为——而在受监管决策中,决策依据必须可审计。我们提出五项治理指标,用于量化依据层面的策略合规性,并在合成银行领域将其与纯文本治理及机械执行机制(四种在模型解释循环外运行的原始操作)进行比较。在纯文本治理下,27%的转交请求未携带任何决策相关信息。机械执行机制将此比率降低73%,使转交信息含量提升一倍以上,并将任务准确率从MCC~$0.43$提升至$0.88$。性能提升源于架构分离:机械执行机制下LLM生成的决策依据在CDL指标上与纯文本治理相当——收益来自将明确决策从模型控制中剥离。因果消融实验证实每种原始操作均具有单独必要性。我们的核心发现是治理-任务解耦:在结构压力下,纯文本治理在两个维度同时退化,而机械执行机制即使任务性能下降仍能保持治理质量。这表明治理评估与任务评估属于不同维度:准确性并不能充分代理受监管AI系统中的治理水平。