This work addresses test output prediction, a key challenge in test case generation. To improve the reliability of predicted outputs by LLMs, prior approaches generate code first to ground predictions. One grounding strategy is direct execution of generated code, but even minor errors can cause failures. To address this, we introduce LLM-based pseudocode execution, which grounds prediction on more error-resilient pseudocode and simulates execution via LLM reasoning. We further propose DuET, a dual-execution framework that combines both approaches by functional majority voting. Our analysis shows the two approaches are complementary in overcoming the limitations of direct execution suffering from code errors, and pseudocode reasoning from hallucination. On LiveCodeBench, DuET achieves the state-of-the-art performance, improving Pass@1 by 13.6 pp.
翻译:本文研究测试输出预测这一测试用例生成中的关键挑战。为提升大语言模型(LLM)预测输出的可靠性,现有方法首先通过生成代码来锚定预测结果。一种锚定策略是直接执行生成的代码,但即使微小错误也可能导致执行失败。针对此问题,我们提出基于LLM的伪代码执行方法,该方法利用对错误更具鲁棒性的伪代码进行预测锚定,并通过LLM推理模拟执行过程。进一步,我们提出DuET,一种通过功能多数投票融合两种途径的双重执行框架。分析表明,这两种方法具有互补性:直接执行受困于代码错误,而伪代码推理受困于幻觉问题,二者相互克服彼此局限。在LiveCodeBench上,DuET实现了当前最优性能,将Pass@1指标提升13.6个百分点。