We introduce a novel framework named ClarifyGPT, which aims to enhance code generation by empowering LLMs with the ability to identify ambiguous requirements and ask targeted clarifying questions. In particular, ClarifyGPT first detects whether a given requirement is ambiguous by performing a code consistency check. If it is ambiguous, ClarifyGPT prompts an LLM to generate targeted clarifying questions. After receiving question responses, ClarifyGPT refines the ambiguous requirement and inputs it into the same LLM to generate a final code solution. To evaluate our ClarifyGPT, we first conduct a human evaluation involving ten participants who use ClarifyGPT for code generation on two publicly available benchmarks: MBPP-sanitized and MBPP-ET. The results show that ClarifyGPT elevates the performance (Pass@1) of GPT-4 from 70.96% to 80.80% on MBPP-sanitized. Furthermore, to perform large-scale automated evaluations of ClarifyGPT across different LLMs and benchmarks without requiring user participation, we introduce a high-fidelity simulation method to simulate user responses. The automated evaluation results also demonstrate that ClarifyGPT can significantly enhance code generation performance compared to the baselines. In particular, ClarifyGPT improves the average performance of GPT-4 and ChatGPT across four benchmarks from 68.02% to 75.75% and from 58.55% to 67.22%, respectively. We believe that ClarifyGPT can effectively facilitate the practical application of LLMs in real-world development environments.
翻译:我们提出了一种名为ClarifyGPT的新型框架,旨在通过赋予大语言模型(LLM)识别模糊需求并提出针对性澄清问题的能力,从而增强代码生成效果。具体而言,ClarifyGPT首先通过执行代码一致性检查来检测给定需求是否存在歧义。若存在歧义,则引导LLM生成针对性澄清问题;在获取问题回答后,ClarifyGPT对模糊需求进行精炼,并将优化后的需求输入同一LLM以生成最终代码解决方案。为评估ClarifyGPT,我们首先开展了一项包含十名参与者的人工评估实验,实验中使用ClarifyGPT在MBPP-sanitized和MBPP-ET两个公开基准上进行代码生成。结果表明,在MBPP-sanitized基准上,ClarifyGPT将GPT-4的性能(Pass@1)从70.96%提升至80.80%。此外,为实现在不同LLM和基准上无需用户参与的大规模自动化评估,我们引入了一种高保真模拟方法来模拟用户响应。自动化评估结果同样显示,与基线方法相比,ClarifyGPT能显著提升代码生成性能。具体而言,在四个基准上,ClarifyGPT将GPT-4和ChatGPT的平均性能分别从68.02%提升至75.75%,以及从58.55%提升至67.22%。我们相信ClarifyGPT能够有效促进LLM在实际开发环境中的落地应用。