Large programming courses struggle to provide timely, detailed feedback on student code. We developed Mark My Works, a local autograding system that combines traditional unit testing with LLM-generated explanations. The system uses role-based prompts to analyze submissions, critique code quality, and generate pedagogical feedback while maintaining transparency in its reasoning process. We piloted the system in a 191-student engineering course, comparing AI-generated assessments with human grading on 79 submissions. While AI scores showed no linear correlation with human scores (r = -0.177, p = 0.124), both systems exhibited similar left-skewed distributions, suggesting they recognize comparable quality hierarchies despite different scoring philosophies. The AI system demonstrated more conservative scoring (mean: 59.95 vs 80.53 human) but generated significantly more detailed technical feedback.
翻译:大型编程课程难以对学生代码提供及时、详细的反馈。我们开发了Mark My Works——一个本地自动评分系统,该系统将传统单元测试与LLM生成的解释相结合。该系统通过基于角色的提示分析代码提交、评估代码质量并生成教学反馈,同时保持推理过程的透明度。我们在一个191名学生的工程课程中试运行了该系统,对79份提交作品进行了AI评分与人工评分的比较。虽然AI评分与人工评分未呈现线性相关性(r = -0.177, p = 0.124),但两种评分系统均表现出相似的左偏分布,表明尽管评分理念不同,它们能识别出可比的质量层级。AI系统表现出更保守的评分倾向(均值:59.95 vs 人工评分80.53),但生成了显著更详细的技术反馈。