Evaluating the Effectiveness of LLMs in Introductory Computer Science Education: A Semester-Long Field Study

The integration of AI assistants, especially through the development of Large Language Models (LLMs), into computer science education has sparked significant debate. An emerging body of work has looked into using LLMs in education, but few have examined the impacts of LLMs on students in entry-level programming courses, particularly in real-world contexts and over extended periods. To address this research gap, we conducted a semester-long, between-subjects study with 50 students using CodeTutor, an LLM-powered assistant developed by our research team. Our study results show that students who used CodeTutor (the experimental group) achieved statistically significant improvements in their final scores compared to peers who did not use the tool (the control group). Within the experimental group, those without prior experience with LLM-powered tools demonstrated significantly greater performance gain than their counterparts. We also found that students expressed positive feedback regarding CodeTutor's capability, though they also had concerns about CodeTutor's limited role in developing critical thinking skills. Over the semester, students' agreement with CodeTutor's suggestions decreased, with a growing preference for support from traditional human teaching assistants. Our analysis further reveals that the quality of user prompts was significantly correlated with CodeTutor's response effectiveness. Building upon our results, we discuss the implications of our findings for integrating Generative AI literacy into curricula to foster critical thinking skills and turn to examining the temporal dynamics of user engagement with LLM-powered tools. We further discuss the discrepancy between the anticipated functions of tools and students' actual capabilities, which sheds light on the need for tailored strategies to improve educational outcomes.

翻译：人工智能助手的整合，尤其是通过大型语言模型的发展，在计算机科学教育中引发了重大讨论。新兴的研究已经开始探索在教育中使用大型语言模型，但很少有研究关注其对入门级编程课程学生的影响，特别是在真实情境和长期跨度下的影响。为填补这一研究空白，我们开展了一项为期一学期的被试间研究，共50名学生参与，使用了我们研究团队开发的大型语言模型驱动工具CodeTutor。研究结果显示，使用CodeTutor的学生（实验组）在期末成绩上相比于未使用该工具的学生（对照组）取得了统计上显著的提升。在实验组内部，那些先前没有大型语言模型驱动工具使用经验的学生比有经验的同学表现出更大的成绩提升。我们还发现，学生对CodeTutor的能力给予了积极反馈，但也对其在培养批判性思维方面的局限性表示担忧。随着学期推进，学生对CodeTutor建议的认同度下降，对传统人类助教支持的偏好逐渐增加。我们的分析进一步揭示了用户提示词质量与CodeTutor响应有效性之间存在显著相关性。基于研究结果，我们讨论了将生成式人工智能素养整合进课程以培养批判性思维能力的影响，并探讨了用户与大型语言模型驱动工具互动的时间动态变化。此外，我们还讨论了工具预期功能与学生实际能力之间的差距，这指明了需要量身定制的策略来改善教育成效。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日