SkelDPO: A Skeleton-Guided Direct Preference Optimization Framework for Efficient Code Generation

With the remarkable progress of Code Large Language Models (Code LLMs) in achieving semantic correctness, execution efficiency has become an increasingly important dimension for evaluating their practical utility. However, existing approaches typically treat full programs as a single optimization target during training, without explicitly modeling the structural factors that influence efficiency. As a result, although these models can generate semantically correct code, they fail to learn, at a fine-grained level, the underlying skeleton features that lead to efficient implementations. To address this limitation, we propose SkelDPO (Skeleton-Guided Direct Preference Optimization), a skeleton-guided preference optimization framework that systematically enhances the efficiency of code generation. SkelDPO first identifies efficient and inefficient implementations from the code dataset and, through comparative analysis, locates their efficiency-prone and inefficiency-prone points, forming alignment signals between efficiency and inefficiency skeletons. During training, a joint code and skeleton preference loss is introduced, enabling the model to learn semantic correctness while reinforcing its understanding of efficiency-critical components in code. Results show that SkelDPO consistently surpasses existing methods: compared with SOTA method that relies solely on efficient and inefficient code preference optimization, it improves Pass@1, Beyond@1, and Effi@1 by 3-6%, 3-7%, and 2-5%, with greater improvements observed on complex tasks. Overall, SkelDPO provides a new perspective on skeleton-level efficiency alignment, breaking the limitation of conventional preference optimization that relies solely on correctness or efficiency pairs. All datasets and source code are publicly available at: https://github.com/icpcSkelDPO/SkelDPO.

翻译：随着代码大语言模型（Code LLMs）在语义正确性方面取得显著进展，执行效率已成为评估其实用价值的重要维度。然而，现有方法通常将完整程序视为单一优化目标进行训练，未能显式建模影响效率的结构性因素。因此，尽管这些模型能生成语义正确的代码，却无法在细粒度层面学习导致高效实现的底层骨架特征。为解决这一局限，我们提出SkelDPO（骨架引导的直接偏好优化）——一种系统性提升代码生成效率的骨架引导偏好优化框架。SkelDPO首先从代码数据集中识别高效与低效实现，通过对比分析定位其效率敏感点与非效率敏感点，形成效率骨架与非效率骨架之间的对齐信号。在训练阶段，引入联合代码与骨架偏好损失，使模型在学习语义正确性的同时强化对代码中效率关键组件的理解。实验结果表明，SkelDPO持续超越现有方法：与仅依赖高效与低效代码偏好优化的最先进方法相比，其在Pass@1、Beyond@1和Effi@1指标上分别提升3-6%、3-7%与2-5%，且在复杂任务上改进幅度更大。总体而言，SkelDPO为骨架级效率对齐提供了新视角，突破了传统偏好优化仅依赖正确性或效率对的局限。所有数据集与源代码均公开于：https://github.com/icpcSkelDPO/SkelDPO。