Combating Harms of Generative AI in CS1 with Code Review Interviews and a Flipped Classroom

Background and Context: Large Language Models (LLMs) are more accessible and accurate than ever before, raising significant concerns for computing educators. One major concern is students using LLMs to bypass the effort needed to understand concepts and metacognitive strategies essential for success in computer science. Objectives: We contribute a unique approach to assessing and building up student understanding through weekly oral code review assessments. These formative assessments incentivize students to understand their submitted code, regardless of whether or not the code was generated by AI tools. We also use a flipped classroom to provide time for students to learn concepts outside of class and provide ample time for students to schedule code review interviews. Methods: For this paper, we collected data from three semesters. We analyze student exam scores, keystroke logs, and surveys to understand how the new course policies affected student learning, behavior, and attitudes. Findings: Pairwise comparison of exam results reveals a statistically insignificant increase in average scores for Fall 2025 compared to previous semesters. Keystroke logs show a significant increase in characters pasted per total characters input into coding assignments in Fall 2025, pointing towards higher AI usage. Survey results show positive student sentiment towards code reviews at the end of Fall 2025, with nearly all negative feedback being addressable through better scheduling and more rigorous TA training. Implications: Oral code reviews with a flipped classroom appear to be effective at mitigating harms of LLM use while providing space for students to freely experiment with these tools. Our work suggests that students in Fall 2025 still show adequate understanding of material covered in written exams, despite dramatic increases in LLM usage for coding assignments.

翻译：背景与上下文：大型语言模型（LLMs）比以往任何时候都更易获取且更准确，这给计算机教育工作者带来了重大担忧。其中一个主要担忧是学生使用LLMs来规避理解概念和元认知策略所需的努力，而这些策略对于计算机科学领域的成功至关重要。目标：我们通过每周的口头代码审查评估，贡献了一种独特的方法来评估和构建学生的理解能力。这些形成性评估激励学生理解他们提交的代码，无论这些代码是否由人工智能工具生成。我们还采用翻转课堂，为学生提供课外学习概念的时间，并为学生安排代码审查面试留出充足时间。方法：对于本文，我们收集了来自三个学期的数据。我们分析学生的考试成绩、键盘记录和问卷调查，以了解新课程政策如何影响学生的学习、行为和态度。发现：考试结果的两两比较显示，与之前学期相比，2025年秋季学期的平均分数在统计上无显著提高。键盘记录显示，在2025年秋季学期的编程作业中，粘贴的字符数占总输入字符数的比例显著增加，这表明AI使用率更高。问卷调查结果显示，在2025年秋季学期末，学生对代码审查持积极态度，几乎所有负面反馈都可以通过更好的日程安排和更严格的助教培训来解决。启示：结合翻转课堂的口头代码审查似乎能有效减轻LLMs使用的危害，同时为学生提供自由尝试这些工具的空间。我们的工作表明，尽管2025年秋季学期学生在编程作业中大量使用LLMs，但他们在书面考试中仍展现出对所学内容的充分理解。