Over the past decade, predictive language modeling for code has proven to be a valuable tool for enabling new forms of automation for developers. More recently, we have seen the advent of general purpose "large language models", based on neural transformer architectures, that have been trained on massive datasets of human written text spanning code and natural language. However, despite the demonstrated representational power of such models, interacting with them has historically been constrained to specific task settings, limiting their general applicability. Many of these limitations were recently overcome with the introduction of ChatGPT, a language model created by OpenAI and trained to operate as a conversational agent, enabling it to answer questions and respond to a wide variety of commands from end-users. The introduction of models, such as ChatGPT, has already spurred fervent discussion from educators, ranging from fear that students could use these AI tools to circumvent learning, to excitement about the new types of learning opportunities that they might unlock. However, given the nascent nature of these tools, we currently lack fundamental knowledge related to how well they perform in different educational settings, and the potential promise (or danger) that they might pose to traditional forms of instruction. As such, in this paper, we examine how well ChatGPT performs when tasked with solving common questions in a popular software testing curriculum. Our findings indicate that ChatGPT can provide correct or partially correct answers in 44% of cases, provide correct or partially correct explanations of answers in 57% of cases, and that prompting the tool in a shared question context leads to a marginally higher rate of correct answers. Based on these findings, we discuss the potential promise, and dangers related to the use of ChatGPT by students and instructors.
翻译:过去十年间,面向代码的预测性语言建模已被证明是为开发者实现新型自动化的重要工具。近期,我们见证了基于神经Transformer架构的通用"大型语言模型"的兴起,这些模型在海量涵盖代码与自然语言的人类撰写文本数据集上进行了训练。然而,尽管此类模型展现出强大的表征能力,但其交互方式历来局限于特定任务场景,制约了通用适用性。随着ChatGPT的推出,这些限制被大幅突破——这一由OpenAI创建的语言模型经过专门训练,可作为对话代理运行,使其能够回答用户问题并响应各类指令。诸如ChatGPT等模型的问世已在教育界激起激烈讨论:从担忧学生可能利用AI工具规避学习,到对由此解锁的新型学习机遇的期待。然而,鉴于这些工具尚处于萌芽阶段,我们目前缺乏关于其在不同教育场景中表现效能的根本性认知,也不明确它们可能为传统教学范式带来的潜在机遇(或风险)。为此,本文系统评估了ChatGPT在解答软件测试通用课程常见问题时的表现能力。研究结果表明:ChatGPT在44%的案例中能提供正确或部分正确的答案,在57%的案例中能给出正确或部分正确的解释说明,且在共享问题语境下引导工具作答可略微提升正确率。基于这些发现,我们探讨了学生与教师在运用ChatGPT时可能面临的潜在机遇与风险。