Is ChatGPT the Ultimate Programming Assistant -- How far is it?

The recent progress in generative AI techniques has significantly influenced software engineering, as AI-driven methods tackle common developer challenges such as code synthesis from descriptions, program repair, and natural language summaries for existing programs. Large-scale language models (LLMs), like OpenAI's Codex, are increasingly adopted in AI-driven software engineering. ChatGPT, another LLM, has gained considerable attention for its potential as a bot for discussing source code, suggesting changes, providing descriptions, and generating code. To evaluate the practicality of LLMs as programming assistant bots, it is essential to examine their performance on unseen problems and various tasks. In our paper, we conduct an empirical analysis of ChatGPT's potential as a fully automated programming assistant, emphasizing code generation, program repair, and code summarization. Our study assesses ChatGPT's performance on common programming problems and compares it to state-of-the-art approaches using two benchmarks. Our research indicates that ChatGPT effectively handles typical programming challenges. However, we also discover the limitations in its attention span: comprehensive descriptions can restrict ChatGPT's focus and impede its ability to utilize its extensive knowledge for problem-solving. Surprisingly, we find that ChatGPT's summary explanations of incorrect code provide valuable insights into the developer's original intentions. This insight can be served as a foundation for future work addressing the oracle problem. Our study offers valuable perspectives on the development of LLMs for programming assistance, specifically by highlighting the significance of prompt engineering and enhancing our comprehension of ChatGPT's practical applications in software engineering.

翻译：生成式AI技术的最新进展显著影响了软件工程领域，这类基于AI的方法正应对开发者常见的挑战，例如从描述中合成代码、程序修复以及为现有程序生成自然语言摘要。以OpenAI的Codex为代表的大规模语言模型（LLMs）越来越多地被应用于AI驱动的软件工程中。另一LLM模型ChatGPT作为一款能讨论源代码、建议修改、提供描述并生成代码的对话机器人，已引发广泛关注。为评估LLM作为编程助手机器人的实用性，有必要检验其在未见问题及多种任务上的表现。本文对ChatGPT作为全自动编程助手的潜力进行了实证分析，重点聚焦代码生成、程序修复与代码摘要三方面。我们评估了ChatGPT在常见编程问题上的表现，并利用两个基准数据集将其与当前最优方法进行了比较。研究表明，ChatGPT能有效处理典型的编程挑战。然而，我们也发现了其注意力范围存在的局限性：全面的描述可能限制ChatGPT的聚焦能力，进而阻碍其利用广博知识解决问题。令人惊讶的是，我们发现ChatGPT对错误代码的摘要性解释能为开发者的原始意图提供宝贵洞见。这一发现可作为未来解决“预言机问题”的基础。本研究为开发用于编程辅助的LLM提供了有价值的视角，尤其凸显了提示工程的重要性，并加深了我们对ChatGPT在软件工程领域实际应用的理解。

相关内容

ChatGPT

关注 258

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/

【Google】深度学习对抗鲁棒性，43页ppt

专知会员服务

47+阅读 · 2020年10月31日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日