LLMs for code generation are commonly evaluated in repeated-sampling settings using Pass@k, where multiple candidate programs are executed against unit tests under a finite sampling budget. While recent verifier-based reinforcement learning (RLVR) methods improve executable correctness, how these objectives affect redundancy among sampled programs remains poorly understood. In this work, we study implementation-level redundancy in code generation using JPlag, a plagiarism-detection system for code. Across models and benchmarks, we show that correctness-only RLVR often concentrates generations around repeated implementations, whereas Pass@k-aware objectives maintain lower redundancy and improve larger-budget performance. Motivated by these observations, we augment RLVR with direct anti-redundancy rewards based on JPlag similarity. Across 3 models and 3 benchmarks, discouraging near-duplicate generations reliably improves finite-budget executable performance, often matching or outperforming specialized Pass@k-aware objectives.
翻译:用于代码生成的大语言模型通常采用Pass@k指标在重复抽样设置下进行评估,即在有限抽样预算内将多个候选程序对单元测试执行。尽管近期基于验证器的强化学习方法提升了可执行代码的正确性,但这些目标如何影响抽样程序间的冗余性仍缺乏深入理解。本研究利用代码剽窃检测系统JPlag,从实现层面探究代码生成中的冗余现象。跨模型与基准测试的实验表明,仅关注正确性的RLVR方法常使生成结果集中于重复实现,而Pass@k感知目标既能维持较低冗余度,又能提升大预算下的性能。受此启发,我们基于JPlag相似度引入直接抗冗余奖励增强RLVR。在3个模型与3个基准测试上,抑制近似重复生成可稳定提升有限预算下的可执行代码性能,其表现常能媲美或超越专用Pass@k感知目标。