Large language models (LLMs) often generate code that is functionally correct but inefficient in runtime and memory. Prior approaches to improving code efficiency typically rely on absolute execution feedback, such as profiling a single program's runtime or memory usage, which is costly and provides weak guidance for refinement. We propose Relative Contrastive Feedback (RCF), an inference-time feedback mechanism that requires no model fine-tuning or parameter updates. RCF compares two structurally similar programs for the same task and highlights the differences associated with better efficiency. Building on this idea, we introduce EffiPair, an inference-time iterative refinement framework that operates entirely at test time by generating multiple candidate solutions, identifying informative program pairs with large efficiency gaps, summarizing their execution differences into lightweight feedback, and using this signal to produce more efficient solutions. By replacing isolated scalar feedback with pairwise contrastive comparisons, EffiPair provides more direct guidance while reducing profiling and prompting overhead. Experiments on code-efficiency benchmarks show that EffiPair consistently improves efficiency while preserving correctness. For instance, with DeepSeek-Chat V3.2, EffiPair achieves up to 1.5x speedup over generation without performance feedback, while reducing token usage by more than 90% compared to prior work.
翻译:大语言模型(LLMs)生成的代码通常在功能上正确,但在运行时和内存方面效率低下。以往提升代码效率的方法通常依赖绝对执行反馈(如分析单个程序的运行时或内存使用情况),这种方法成本高昂且对代码优化的指引力较弱。我们提出相对对比反馈(RCF)——一种无需模型微调或参数更新的推理时反馈机制。RCF通过对比同一任务的两个结构相似程序,突出与更高效率相关的差异。基于此思想,我们提出了EffiPair——一种完全在测试时运行的推理时迭代优化框架,通过生成多个候选方案、识别存在效率差距的程序对、将程序执行差异总结为轻量级反馈,并利用此信号生成更高效的方案。通过将孤立的标量反馈替换为成对对比反馈,EffiPair在减少性能分析和提示开销的同时提供更直接的优化指引。在代码效率基准测试中,EffiPair在保持正确性的前提下持续提升效率。以DeepSeek-Chat V3.2为例,与无性能反馈的生成方法相比,EffiPair实现了最高1.5倍的加速比,同时将token使用量较以往工作降低90%以上。