Large language models (LLMs) have demonstrated notable proficiency in code generation, with numerous prior studies showing their promising capabilities in various development scenarios. However, these studies mainly provide evaluations in research settings, which leaves a significant gap in understanding how effectively LLMs can support developers in real-world. To address this, we conducted an empirical analysis of conversations in DevGPT, a dataset collected from developers' conversations with ChatGPT (captured with the Share Link feature on platforms such as GitHub). Our empirical findings indicate that the current practice of using LLM-generated code is typically limited to either demonstrating high-level concepts or providing examples in documentation, rather than to be used as production-ready code. These findings indicate that there is much future work needed to improve LLMs in code generation before they can be integral parts of modern software development.
翻译:大语言模型(LLMs)在代码生成方面已展现出显著能力,众多先前研究证实了其在各类开发场景中的潜力。然而,这些研究主要基于实验室环境评估,导致我们对于LLMs在真实场景中有效支持开发者的能力认知存在显著空白。为此,我们基于DevGPT数据集(该数据集采集自开发者与ChatGPT的对话记录,通过GitHub等平台的分享链接功能获取)进行了实证分析。实证结果表明,当前LLM生成代码的实践通常局限于两种情况:展示高层概念或提供文档范例,而非作为生产级代码投入使用。这些发现表明,在LLMs能够成为现代软件开发核心组件之前,仍需大量后续工作提升其代码生成能力。