The integration of AI assistants, especially through the development of Large Language Models (LLMs), into computer science education has sparked significant debate. An emerging body of work has looked into using LLMs in education, but few have examined the impacts of LLMs on students in entry-level programming courses, particularly in real-world contexts and over extended periods. To address this research gap, we conducted a semester-long, between-subjects study with 50 students using CodeTutor, an LLM-powered assistant developed by our research team. Our study results show that students who used CodeTutor (the experimental group) achieved statistically significant improvements in their final scores compared to peers who did not use the tool (the control group). Within the experimental group, those without prior experience with LLM-powered tools demonstrated significantly greater performance gain than their counterparts. We also found that students expressed positive feedback regarding CodeTutor's capability, though they also had concerns about CodeTutor's limited role in developing critical thinking skills. Over the semester, students' agreement with CodeTutor's suggestions decreased, with a growing preference for support from traditional human teaching assistants. Our analysis further reveals that the quality of user prompts was significantly correlated with CodeTutor's response effectiveness. Building upon our results, we discuss the implications of our findings for integrating Generative AI literacy into curricula to foster critical thinking skills and turn to examining the temporal dynamics of user engagement with LLM-powered tools. We further discuss the discrepancy between the anticipated functions of tools and students' actual capabilities, which sheds light on the need for tailored strategies to improve educational outcomes.
翻译:人工智能助手(特别是通过大型语言模型(LLMs)的发展)融入计算机科学教育引发了广泛讨论。虽然已有研究开始探索LLMs在教育中的应用,但很少有研究考察LLMs对入门编程课程学生的影响,尤其是在真实环境和长期情境下的影响。为填补这一研究空白,我们以50名学生为对象,开展了一项为期一学期的组间设计研究,使用由我们研究团队开发的LLM驱动工具CodeTutor。研究结果显示,使用CodeTutor的学生(实验组)在期末成绩上相比未使用该工具的学生(对照组)取得了统计上显著的提升。在实验组内部,此前未使用过LLM驱动工具的学生表现出比有相关经验学生更显著的性能收益。我们还发现,学生对CodeTutor的能力给予了积极反馈,但也对其在培养批判性思维方面作用有限表示担忧。随着学期推进,学生对CodeTutor建议的认同度逐渐下降,对传统人类助教支持的偏好日益增强。进一步分析表明,用户提示词的质量与CodeTutor的回应有效性显著相关。基于研究结果,我们探讨了将生成式AI素养融入课程以培养批判性思维的意义,并考察了用户与LLM驱动工具互动的时序动态特征。此外,我们进一步讨论了工具预期功能与学生实际能力之间的差距,这凸显了需要制定针对性策略以改善教育成效的必要性。