Over the past decade, predictive language modeling for code has proven to be a valuable tool for enabling new forms of automation for developers. More recently, we have seen the advent of general purpose "large language models", based on neural transformer architectures, that have been trained on massive datasets of human written text spanning code and natural language. However, despite the demonstrated representational power of such models, interacting with them has historically been constrained to specific task settings, limiting their general applicability. Many of these limitations were recently overcome with the introduction of ChatGPT, a language model created by OpenAI and trained to operate as a conversational agent, enabling it to answer questions and respond to a wide variety of commands from end-users. The introduction of models, such as ChatGPT, has already spurred fervent discussion from educators, ranging from fear that students could use these AI tools to circumvent learning, to excitement about the new types of learning opportunities that they might unlock. However, given the nascent nature of these tools, we currently lack fundamental knowledge related to how well they perform in different educational settings, and the potential promise (or danger) that they might pose to traditional forms of instruction. As such, in this poster, we examine how well ChatGPT performs when tasked with solving common questions in a popular software testing curriculum. Our findings indicate that ChatGPT can provide correct or partially correct answers in 44% of cases, provide correct or partially correct explanations of answers in 57% of cases, and that prompting the tool in a shared question context leads to a marginally higher rate of correct answers. Based on these findings, we discuss the potential promise, and dangers related to the use of ChatGPT by students and instructors.
翻译:过去的十年中,用于代码的预测性语言建模已被证明是实现开发者新型自动化的重要工具。近年来,基于神经Transformer架构的通用“大型语言模型”应运而生,这些模型在海量涵盖代码与自然语言的人类书写文本数据集上进行了训练。然而,尽管此类模型展现了强大的表征能力,但历史上与它们的交互一直局限于特定任务场景,限制了其广泛适用性。近期,随着ChatGPT的推出,许多此类限制得以突破——该模型由OpenAI创建,经过训练可作为对话代理运行,从而能够回答用户问题并响应各种指令。ChatGPT等模型的引入已引发教育工作者们的激烈讨论,从担忧学生可能利用这些AI工具规避学习,到对这些工具可能开启的新型学习机会感到兴奋。然而,鉴于这类工具仍处于萌芽阶段,我们目前缺乏关于它们在不同教育场景中表现如何的核心认知,也尚不清楚它们可能对传统教学形式带来的潜在机遇(或风险)。因此,在本海报中,我们考察了ChatGPT在解答热门软件测试课程中常见问题时的表现。研究结果表明:ChatGPT在44%的案例中能提供正确或部分正确的答案,在57%的案例中能提供正确或部分正确的解释,且在共享问题上下文中引导该工具时,正确答案的比例略有提升。基于这些发现,我们探讨了学生和教师使用ChatGPT的潜在机遇与风险。