interwhen: A Generalizable Framework for Verifiable Reasoning with Test-time Monitors

We present a test-time verification framework, interwhen, that ensures that the output of a reasoning model is valid wrt. a given set of verifiers. Verified reasoning is an important goal in high-stakes scenarios such as deploying agents in the physical world or in domains such as law and finance. However, current techniques either rely on the generate-test paradigm that verifies only after the final answer is produced, or verify partial output through a step-extraction paradigm where the task execution is externally broken down into structured steps. The former is inefficient while the latter artificially restricts a model's problem solving strategies. Instead, we propose to verify a model's reasoning trace as-is, taking full advantage of a model's reasoning capabilities while verifying and steering the model's output only when needed. The key idea is meta-prompting, identifying the verifiable properties that any partial solution should satisfy and then prompting the model to follow a custom format in its trace such that partial outputs can be easily parsed and checked. We consider both self-verification and external verification and find that interwhen provides a useful abstraction to provide feedback and steer reasoning models in each case. Using self-verification, interwhen obtains state-of-the-art results on early stopping reasoning models, without any loss in accuracy. Using external verifiers, interwhen obtains 10 p.p. improvement in accuracy over test-time scaling methods, while ensuring 100% soundness and being 4x more efficient. The code for interwhen is available at https://github.com/microsoft/interwhen

翻译：我们提出了一种测试时验证框架interwhen，该框架确保推理模型的输出相对于给定验证器集合是有效的。可验证推理在物理世界部署智能体或法律、金融等高风险场景中具有重要意义。然而，现有技术要么依赖生成-测试范式（仅在最终答案产生后进行验证），要么通过步骤提取范式验证部分输出（将任务执行外部拆分为结构化步骤）。前者效率低下，后者则人为限制了模型的问题解决策略。为此，我们提出直接验证模型的推理轨迹，在充分利用模型推理能力的同时，仅在需要时对模型输出进行验证与引导。其核心思想是元提示技术：首先识别任何部分解应满足的可验证属性，然后提示模型在其轨迹中遵循定制格式，使得部分输出能够被轻松解析和检查。我们同时考虑了自我验证与外部验证，发现interwhen为两种场景下的反馈提供与推理模型引导提供了有效的抽象框架。通过自我验证，interwhen在早期停止推理模型中取得了最先进的成果，且准确率无任何损失。通过外部验证器，interwhen相比测试时缩放方法在准确率上提升10个百分点，同时保证100%的可靠性，且效率提升4倍。interwhen的代码发布于https://github.com/microsoft/interwhen