Large language models (LLMs) have demonstrated notable proficiency in code generation, with numerous prior studies showing their promising capabilities in various development scenarios. However, these studies mainly provide evaluations in research settings, which leaves a significant gap in understanding how effectively LLMs can support developers in real-world. To address this, we conducted an empirical analysis of conversations in DevGPT, a dataset collected from developers' conversations with ChatGPT (captured with the Share Link feature on platforms such as GitHub). Our empirical findings indicate that the current practice of using LLM-generated code is typically limited to either demonstrating high-level concepts or providing examples in documentation, rather than to be used as production-ready code. These findings indicate that there is much future work needed to improve LLMs in code generation before they can be integral parts of modern software development.
翻译:大语言模型在代码生成方面展现出显著能力,已有诸多研究证明其在各类开发场景中具有可观潜力。然而,这些研究主要基于实验环境进行评估,导致我们对大语言模型在真实场景中如何有效支持开发者仍存在认知鸿沟。为弥补这一不足,我们基于DevGPT数据集(该数据集通过GitHub等平台的分享链接功能采集了开发者与ChatGPT的对话记录)开展实证分析。研究结果表明,当前大语言模型生成代码的实际应用普遍局限于演示高层概念或提供文档示例,尚未达到可直接投入生产的代码水平。这些发现表明,在代码生成领域,要使大语言模型真正成为现代软件开发的核心组件,仍需开展大量改进工作。