Unmasking the giant: A comprehensive evaluation of ChatGPT's proficiency in coding algorithms and data structures

The transformative influence of Large Language Models (LLMs) is profoundly reshaping the Artificial Intelligence (AI) technology domain. Notably, ChatGPT distinguishes itself within these models, demonstrating remarkable performance in multi-turn conversations and exhibiting code proficiency across an array of languages. In this paper, we carry out a comprehensive evaluation of ChatGPT's coding capabilities based on what is to date the largest catalog of coding challenges. Our focus is on the python programming language and problems centered on data structures and algorithms, two topics at the very foundations of Computer Science. We evaluate ChatGPT for its ability to generate correct solutions to the problems fed to it, its code quality, and nature of run-time errors thrown by its code. Where ChatGPT code successfully executes, but fails to solve the problem at hand, we look into patterns in the test cases passed in order to gain some insights into how wrong ChatGPT code is in these kinds of situations. To infer whether ChatGPT might have directly memorized some of the data that was used to train it, we methodically design an experiment to investigate this phenomena. Making comparisons with human performance whenever feasible, we investigate all the above questions from the context of both its underlying learning models (GPT-3.5 and GPT-4), on a vast array sub-topics within the main topics, and on problems having varying degrees of difficulty.

翻译：大型语言模型（LLMs）的变革性影响正深刻重塑人工智能（AI）技术领域。值得注意的是，ChatGPT在这些模型中脱颖而出，在多轮对话中展现出卓越性能，并在多种语言中表现出编程熟练度。本文基于迄今为止规模最大的编程挑战数据集，对ChatGPT的编码能力进行了全面评估。我们聚焦于Python编程语言以及数据结构和算法问题——这两大计算机科学基石。我们评估了ChatGPT在生成问题正确解决方案方面的能力、代码质量以及其代码产生的运行时错误性质。当ChatGPT代码成功执行但未能解决当前问题时，我们分析其通过的测试用例模式，以深入了解这些情况下ChatGPT代码的错误程度。为推断ChatGPT是否可能直接记忆了部分训练数据，我们系统地设计了一项实验来探究这一现象。在可行的情况下，我们将其与人类表现进行对比，从底层学习模型（GPT-3.5和GPT-4）的视角出发，针对主要主题下的广泛子主题以及不同难度级别的问题，研究了上述所有问题。

相关内容

ChatGPT

关注 258

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日