Large Language Models (LLMs) have revolutionized code generation but require significant resources and often over-generalize, limiting their task-specific efficiency. Fine-tuning smaller, open-source LLMs provides a cost-effective alternative. However, standard supervised approaches rely only on correct examples, missing valuable insights from failures. We introduce CodeLutra, a framework that leverages both correct and incorrect code attempts. Instead of using only correct solutions, CodeLutra applies iterative preference-based refinement, comparing successful and failed outputs to better approximate desired results. This approach narrows the performance gap with state-of-the-art larger models without requiring massive datasets or auxiliary models. For instance, on a challenging data science coding task, using only 500 samples improved Llama-3-8B's accuracy from 28.2% to 48.6%, approaching GPT-4's level. By learning from both successes and mistakes, CodeLutra provides a scalable and efficient path to high-quality code generation, making smaller open-source models more competitive with leading closed-source alternatives.
翻译:大语言模型(LLMs)已彻底改变了代码生成领域,但其需要大量计算资源且常存在过度泛化问题,这限制了其在特定任务上的效率。对更小规模的开源大语言模型进行微调提供了一种经济高效的替代方案。然而,传统的监督学习方法仅依赖于正确的代码示例,未能从失败的尝试中汲取宝贵经验。我们提出了CodeLutra框架,该框架同时利用了正确和错误的代码尝试。CodeLutra并非仅使用正确解决方案,而是采用基于偏好的迭代精炼方法,通过比较成功与失败的输出来更好地逼近期望结果。此方法缩小了与顶尖大型模型之间的性能差距,且无需海量数据集或辅助模型。例如,在一项具有挑战性的数据科学编码任务中,仅使用500个样本就将Llama-3-8B的准确率从28.2%提升至48.6%,接近GPT-4的水平。通过同时从成功与错误中学习,CodeLutra为高质量代码生成提供了一条可扩展且高效的路径,使较小的开源模型在与领先的闭源替代方案的竞争中更具优势。