Reinforcement learning is increasingly used for code-centric tasks. These tasks include code generation, summarization, understanding, repair, testing, and optimization. This trend is growing faster with large language models and autonomous agents. A key challenge is how to design reward signals that make sense for software. In many RL problems, the reward is a clear number. In software, this is often not possible. The goal is rarely a single numeric objective. Instead, rewards are usually proxies. Common proxies check if the code compiles, passes tests, or satisfies quality metrics. Many reward designs have been proposed for code-related tasks. However, the work is scattered across areas and papers. There is no single survey that brings these approaches together and shows the full landscape of reward design for RL in software. In this survey, we provide the first systematic and comprehensive review of reward engineering for RL in software tasks. We focus on existing methods and techniques. We structure the literature along three complementary dimensions, summarizing the reward-design choices within each. We conclude with challenges and recommendations in the reward design space for SE tasks.
翻译:强化学习在代码中心任务中的应用日益广泛。这些任务包括代码生成、摘要、理解、修复、测试和优化。随着大语言模型和自主智能体的发展,这一趋势正在加速。一个关键挑战在于如何设计适用于软件领域的奖励信号。在许多强化学习问题中,奖励是一个明确的数值。但在软件领域,这通常难以实现。目标很少是单一的数字指标,奖励通常只是替代性指标。常见的替代指标包括检查代码是否编译通过、是否通过测试或是否满足质量度量。针对代码相关任务,已有许多奖励设计方案被提出。然而,相关工作分散在不同领域和论文中,目前尚无系统性综述将这些方法整合起来,全面展示软件领域强化学习奖励设计的全景。本综述首次对软件任务中强化学习的奖励机制设计进行了系统而全面的回顾。我们聚焦于现有方法和技术,从三个互补维度梳理文献,总结每个维度内的奖励设计选择。最后,我们针对软件工程任务奖励设计空间提出了挑战与建议。