Large Language Models (LLMs) have significantly advanced automated code generation, yet they struggle with complex coding tasks requiring multi-step logical reasoning. High-quality reasoning data is crucial for improving LLMs' reasoning capabilities, but such datasets remain scarce. Existing approaches either rely on computationally expensive reinforcement learning (RL) or error-prone reasoning chains synthesized by LLMs, posing challenges in scalability and accuracy. To address this challenge, we propose SVRC (Structured and Validated Reasoning Chains for Code Generation), a novel framework that mines, restructures, and enriches reasoning chains from community-driven discussions on software engineering platforms. SVRC refines unstructured and incomplete discussions of coding problems by aligning them with Software Development Life Cycle (SDLC) principles, ensuring that reasoning chains capture real-world problem-solving strategies and support iterative refinement. To evaluate the effectiveness of SVRC, we introduce CodeThinker, an LLM fine-tuned on 12,444 reasoning-augmented samples generated by SVRC. Experiments on LiveCodeBench show that CodeThinker surpasses its base model by 42.86\% on medium-level code problems in terms of pass@1 and outperforms GPT-4o-mini and GPT-4o by 73.14\% and 115.86\%, respectively. Our ablation study further highlights that each component of SVRC contributes to the reasoning capabilities of CodeThinker.
翻译:大型语言模型(LLMs)在自动化代码生成方面取得了显著进展,但在需要多步逻辑推理的复杂编码任务上仍存在困难。高质量的推理数据对于提升LLMs的推理能力至关重要,然而此类数据集仍然稀缺。现有方法要么依赖于计算成本高昂的强化学习(RL),要么依赖于LLMs合成的易出错的推理链,在可扩展性和准确性方面均面临挑战。为应对这一挑战,我们提出了SVRC(面向代码生成的结构化与验证推理链),这是一个新颖的框架,能够从软件工程平台上的社区驱动讨论中挖掘、重构并丰富推理链。SVRC通过将非结构化且不完整的编码问题讨论与软件开发生命周期(SDLC)原则对齐,对其进行精炼,确保推理链能够捕捉真实世界的问题解决策略并支持迭代优化。为评估SVRC的有效性,我们引入了CodeThinker,这是一个基于SVRC生成的12,444个推理增强样本进行微调的LLM。在LiveCodeBench上的实验表明,CodeThinker在中等难度代码问题的pass@1指标上超越了其基础模型42.86%,并分别以73.14%和115.86%的优势超越了GPT-4o-mini和GPT-4o。我们的消融研究进一步表明,SVRC的每个组件都对CodeThinker的推理能力有所贡献。