生命周期感知的代码生成：在大型语言模型中利用软件工程阶段 (Lifecycle-Aware code generation: Leveraging Software Engineering Phases in LLMs)

Recent progress in large language models (LLMs) has advanced automatic code generation, yet most approaches rely on direct, single-step translation from problem descriptions to code, disregarding structured software engineering practices. We introduce a lifecycle-aware framework that systematically incorporates intermediate artifacts such as requirements analysis, state machine modeling, and pseudocode into both the training and inference stages. This design aligns code generation with standard software development phases and enables more structured reasoning. Experiments show that lifecycle-level fine-tuning improves code correctness by up to 75% over the same model before fine-tuning, with performance gains compounding across intermediate stages. Multi-step inference consistently surpasses single-step generation, demonstrating the effectiveness of intermediate scaffolding. Notably, open-source LLMs, once fine-tuned under our framework, match or slightly outperform models pretrained on code. When applied to DeepSeek-Coder-1.3B, our framework yields relative CodeBLEU improvements of 34.3%, 20.0%, 11.2%, and 22.3% over ChatGPT-3.5, ChatGPT-4o-mini, DeepSeek-R1, and LLaMA-8B, respectively. Our pipeline also proves robust with up to 80\% less training data, confirming its resilience. Ablation studies further reveal that each intermediate artifact contributes distinctly to final code quality, with state machine modeling yielding the most substantial impact. Our source code and detailed experimental data are available at https://anonymous.4open.science/r/Lifecycle-Aware-3CCB.

翻译：大型语言模型（LLMs）的最新进展推动了自动代码生成的发展，然而大多数方法依赖于从问题描述到代码的直接、单步翻译，忽视了结构化的软件工程实践。我们引入了一个生命周期感知的框架，该框架系统地将需求分析、状态机建模和伪代码等中间产物纳入训练和推理阶段。这种设计使代码生成与标准软件开发阶段保持一致，并实现了更结构化的推理。实验表明，与微调前的同一模型相比，生命周期级别的微调将代码正确性提高了高达75%，且性能增益在中间阶段呈复合增长。多步推理持续超越单步生成，证明了中间脚手架的有效性。值得注意的是，开源LLMs在我们的框架下进行微调后，其表现与在代码上预训练的模型相当或略有超越。当应用于DeepSeek-Coder-1.3B时，我们的框架相对于ChatGPT-3.5、ChatGPT-4o-mini、DeepSeek-R1和LLaMA-8B，分别实现了34.3%、20.0%、11.2%和22.3%的CodeBLEU相对改进。我们的流程还证明了其鲁棒性，在训练数据减少高达80%的情况下仍能保持性能，确认了其韧性。消融研究进一步揭示，每个中间产物对最终代码质量都有独特贡献，其中状态机建模产生的影响最为显著。我们的源代码和详细的实验数据可在 https://anonymous.4open.science/r/Lifecycle-Aware-3CCB 获取。