We asked ChatGPT to participate in an undergraduate computer science exam on ''Algorithms and Data Structures''. The program was evaluated on the entire exam as posed to the students. We hand-copied its answers onto an exam sheet, which was subsequently graded in a blind setup alongside those of 200 participating students. We find that ChatGPT narrowly passed the exam, obtaining 20.5 out of 40 points. This impressive performance indicates that ChatGPT can indeed succeed in challenging tasks like university exams. At the same time, the questions in our exam are structurally similar to those of other exams, solved homework problems, and teaching materials that can be found online and might have been part of ChatGPT's training data. Therefore, it would be inadequate to conclude from this experiment that ChatGPT has any understanding of computer science. We also assess the improvements brought by GPT-4. We find that GPT-4 would have obtained about 17\% more exam points than GPT-3.5, reaching the performance of the average student. The transcripts of our conversations with ChatGPT are available at \url{https://github.com/tml-tuebingen/chatgpt-algorithm-exam}, and the entire graded exam is in the appendix of this paper.
翻译:我们要求 ChatGPT 参加一场面向本科生的《算法与数据结构》计算机科学考试。该程序被评估了整场考试中的所有题目(与学生所做的题目完全相同)。我们将其答案手工抄写到答题纸上,随后在200名学生参与的双盲评分环境中进行评分。我们发现 ChatGPT 勉强通过了考试,获得40分中的20.5分。这一令人印象深刻的表现表明,ChatGPT 确实能成功应对大学考试此类具有挑战性的任务。同时,我们的考试题目在结构上与其他考试、已解决的作业问题及在线教学材料相似,而这些材料可能已包含在 ChatGPT 的训练数据中。因此,仅凭本次实验不能得出 ChatGPT 具备计算机科学理解的结论。我们还评估了 GPT-4 带来的改进,发现其考试得分比 GPT-3.5 高出约17%,达到普通学生的水平。我们与 ChatGPT 的对话记录可在 \url{https://github.com/tml-tuebingen/chatgpt-algorithm-exam} 获取,完整评分试卷见本文附录。