Tackling complex coding tasks often requires autonomous agents and iterative repair pipelines. These increasingly rely on large amounts of test-time computation, often spending many decoding and repair steps before discovering whether a program compiles, runs, or validates. Executable parallel-code translation is an effective setting for earlier guidance because success is behavioral rather than textual. However, most guidance methods act only after complete programs or textual traces are decoded. This motivates the question: can latent reasoning provide an earlier intervention point, before the model commits to code? We study a test-time latent guidance method for this setting that trains a smaller Process Reward Model (PRM) over continuous latent prefixes and uses it to select among alternate hidden-state trajectories before final code decoding, separately from but compatible with post-decoding optimization. On a 76-task ParaTrans benchmark evaluation, latent PRM guidance improves mean validation rate from 32.89% with unguided latent reasoning to 42.1%, outperforming fine-tuned and vanilla baselines in the same setting. These gains persist under the same three-iteration repair loop. These results provide bounded evidence that useful alternative latent continuations exist and that PRM-scored latent branch selection can improve executable outcomes in this setting without retraining the main generative model.
翻译:解决复杂编码任务通常需要自主代理和迭代修复流水线。这些方法日益依赖大量测试时计算,通常需经过多轮解码与修复步骤,才能确定程序能否编译、运行或验证。可执行的并行代码翻译是早期引导的有效场景,因其成功与否取决于行为而非文本形式。然而,大多数引导方法仅在完整程序或文本迹解码后才起作用。这引发了一个问题:潜在推理能否在模型生成最终代码之前提供更早的干预点?我们针对此场景研究了一种测试时潜在引导方法,该方法在连续潜在前缀上训练小型过程奖励模型(PRM),并利用其在最终代码解码前选择不同的隐藏状态轨迹,此过程与解码后优化独立但可兼容。在包含76个任务的ParaTrans基准评估中,潜在PRM引导将平均验证率从无引导潜在推理的32.89%提升至42.1%,在相同设置下优于微调基线模型和原始基线模型。在相同的三轮修复循环下,这些增益仍然保持。这些结果为以下观点提供了有限证据:存在有用的替代潜在延续路径,且通过PRM评分的潜在分支选择能在不重新训练主生成模型的情况下,改善此场景下的可执行结果。