Mechanical Enforcement for LLM Governance:Evidence of Governance-Task Decoupling in Financial Decision Systems

Large language models in regulated financial workflows are governed by natural-language policies that the same model interprets, creating a principal--agent failure: outputs can appear compliant without being compliant. Existing evaluation measures task accuracy but not whether governance constrains behaviour at the decision rationale level -- where regulated decisions must be auditable. We introduce five governance metrics that quantify policy compliance at the rationale level and apply them in a synthetic banking domain to compare text-only governance against mechanical enforcement: four primitives operating outside the model's interpretive loop. Under text-only governance, 27% of deferrals carry no decision-relevant information. Mechanical enforcement reduces this rate by 73%, more than doubles deferral information content, and raises task accuracy from MCC~$0.43$ to $0.88$. The improvement is driven by architectural separation: LLM-generated rationales under mechanical enforcement show comparable CDL to text-only governance -- the gain comes from removing clear-cut decisions from the model's control. A causal ablation confirms that each primitive is individually necessary. Our central finding is a governance-task decoupling: under structural stress, text-only governance degrades on both dimensions simultaneously, whereas mechanical enforcement preserves governance quality even as task performance drops. This implies that governance and task evaluation are distinct axes: accuracy is not a sufficient proxy for governance in regulated AI systems.

翻译：受监管金融工作流中的大语言模型受制于其自身解释的自然语言策略，由此产生委托-代理失效问题：输出可能表面上合规而实际未合规。现有评估方法仅衡量任务准确性，却未评估治理措施是否在决策依据层面约束行为——而在受监管决策中，决策依据必须可审计。我们提出五项治理指标，用于量化依据层面的策略合规性，并在合成银行领域将其与纯文本治理及机械执行机制（四种在模型解释循环外运行的原始操作）进行比较。在纯文本治理下，27%的转交请求未携带任何决策相关信息。机械执行机制将此比率降低73%，使转交信息含量提升一倍以上，并将任务准确率从MCC~$0.43$提升至$0.88$。性能提升源于架构分离：机械执行机制下LLM生成的决策依据在CDL指标上与纯文本治理相当——收益来自将明确决策从模型控制中剥离。因果消融实验证实每种原始操作均具有单独必要性。我们的核心发现是治理-任务解耦：在结构压力下，纯文本治理在两个维度同时退化，而机械执行机制即使任务性能下降仍能保持治理质量。这表明治理评估与任务评估属于不同维度：准确性并不能充分代理受监管AI系统中的治理水平。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《信任但需验证：军事决策背景下的大型语言模型品格、能力与控制》2026最新59页报告

专知会员服务

16+阅读 · 6月12日

综述 | 推理时控制：可信大语言模型的运行时治理全景

专知会员服务

8+阅读 · 5月31日

【斯坦福博士论文】语言模型的机械可解释性与控制

专知会员服务

11+阅读 · 4月23日

大语言模型智能体（LLM Agents）工具调用的演进：从单工具调用到多工具协同编排

专知会员服务

29+阅读 · 4月6日