Extending the Frontier of ChatGPT: Code Generation and Debugging

Large-scale language models (LLMs) have emerged as a groundbreaking innovation in the realm of question-answering and conversational agents. These models, leveraging different deep learning architectures such as Transformers, are trained on vast corpora to predict sentences based on given queries. Among these LLMs, ChatGPT, developed by OpenAI, has ushered in a new era by utilizing artificial intelligence (AI) to tackle diverse problem domains, ranging from composing essays and biographies to solving intricate mathematical integrals. The versatile applications enabled by ChatGPT offer immense value to users. However, assessing the performance of ChatGPT's output poses a challenge, particularly in scenarios where queries lack clear objective criteria for correctness. For instance, evaluating the quality of generated essays becomes arduous and relies heavily on manual labor, in stark contrast to evaluating solutions to well-defined, closed-ended questions such as mathematical problems. This research paper delves into the efficacy of ChatGPT in solving programming problems, examining both the correctness and the efficiency of its solution in terms of time and memory complexity. The research reveals a commendable overall success rate of 71.875\%, denoting the proportion of problems for which ChatGPT was able to provide correct solutions that successfully satisfied all the test cases present in Leetcode. It exhibits strengths in structured problems and shows a linear correlation between its success rate and problem acceptance rates. However, it struggles to improve solutions based on feedback, pointing to potential shortcomings in debugging tasks. These findings provide a compact yet insightful glimpse into ChatGPT's capabilities and areas for improvement.

翻译：大型语言模型作为问答系统和对话代理领域的突破性创新而崭露头角。这些模型利用Transformer等不同的深度学习架构，在海量语料库上进行训练，以根据给定的查询预测句子。在这些大型语言模型中，由OpenAI开发的ChatGPT开启了新时代，它利用人工智能解决从撰写论文和传记到求解复杂数学积分等多样化的领域问题。ChatGPT所实现的广泛应用为用户提供了巨大价值。然而，评估ChatGPT输出的性能是一项挑战，尤其是在查询缺乏明确客观正确性标准的场景中。例如，评估生成论文的质量变得困难且严重依赖人工劳动，这与评估数学问题等定义明确的封闭式问题的解决方案形成鲜明对比。本研究论文深入探讨了ChatGPT在解决编程问题方面的效能，从解的正确性以及时间和空间复杂度方面的效率两个维度进行了考察。研究显示，其整体成功率达到令人称赞的71.875%，即ChatGPT能够提供满足LeetCode所有测试用例的正确解决方案的问题比例。它在结构性问题中表现出优势，其成功率与问题的接受率呈线性相关。然而，它在基于反馈改进解决方案方面存在困难，指出了在调试任务中的潜在不足。这些发现为理解ChatGPT的能力和待改进之处提供了简洁而富有洞见的视角。

相关内容

ChatGPT

关注 258

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日