Over the past decade, predictive language modeling for code has proven to be a valuable tool for enabling new forms of automation for developers. More recently, we have seen the advent of general purpose "large language models", based on neural transformer architectures, that have been trained on massive datasets of human written text spanning code and natural language. However, despite the demonstrated representational power of such models, interacting with them has historically been constrained to specific task settings, limiting their general applicability. Many of these limitations were recently overcome with the introduction of ChatGPT, a language model created by OpenAI and trained to operate as a conversational agent, enabling it to answer questions and respond to a wide variety of commands from end users. The introduction of models, such as ChatGPT, has already spurred fervent discussion from educators, ranging from fear that students could use these AI tools to circumvent learning, to excitement about the new types of learning opportunities that they might unlock. However, given the nascent nature of these tools, we currently lack fundamental knowledge related to how well they perform in different educational settings, and the potential promise (or danger) that they might pose to traditional forms of instruction. As such, in this paper, we examine how well ChatGPT performs when tasked with answering common questions in a popular software testing curriculum. Our findings indicate that ChatGPT can provide correct or partially correct answers in 55.6% of cases, provide correct or partially correct explanations of answers in 53.0% of cases, and that prompting the tool in a shared question context leads to a marginally higher rate of correct responses. Based on these findings, we discuss the potential promises and perils related to the use of ChatGPT by students and instructors.
翻译:过去十年中,针对代码的预测性语言模型已被证明是为开发者实现新型自动化的宝贵工具。近年来,基于神经Transformer架构的通用“大型语言模型”问世,这些模型在海量人类书写文本(涵盖代码与自然语言)构成的数据集上进行了训练。然而,尽管此类模型展现出强大的表征能力,与之交互却历来受限于特定任务场景,制约了其普遍适用性。近期,OpenAI创建的语言模型ChatGPT——经过训练可作为对话代理运行,能够回答终端用户问题并响应各类指令——的推出,克服了上述诸多限制。诸如ChatGPT等模型的问世,已在教育工作者中引发激烈讨论:从担忧学生可能利用这些AI工具规避学习,到期待其可能解锁的新型学习机遇。然而,鉴于这些工具尚处萌芽阶段,我们在不同教育场景下其表现优劣,以及它们对传统教学形式可能带来的潜在前景(或危害)方面,仍缺乏基础性认识。为此,本文探究了ChatGPT在回答主流软件测试课程常见问题时的表现。研究发现:ChatGPT能在55.6%的情况下给出正确或部分正确的答案,在53.0%的情况下给出正确或部分正确的答案解释,且在共享问题语境下提示该工具会带来略高的正确响应率。基于这些发现,我们讨论了学生与教师在运用ChatGPT时可能面临的前景与隐忧。