This study investigates K--12 teachers' perceptions and experiences with AI-supported rubric generation during a summer professional development workshop ($n = 25$). Teachers used MagicSchool.ai to generate rubrics and practiced prompting to tailor criteria and performance levels. They then applied these rubrics to provide feedback on a sample block-based programming activity, followed by using a chatbot to deliver rubric-based feedback for the same work. Data were collected through pre- and post-workshop surveys, open discussions, and exit tickets. We used thematic analysis to analyze the qualitative data. Teachers reported that they rarely create rubrics from scratch because the process is time-consuming and defining clear distinctions between performance levels is challenging. After hands-on use, teachers described AI-generated rubrics as strong starting drafts that improved structure and clarified vague criteria. However, they emphasized the need for teacher oversight due to generic or grade-misaligned language, occasional misalignment with instructional priorities, and the need for substantial editing. Survey results indicated high perceived clarity and ethical acceptability, moderate alignment with assignments, and usability as the primary weakness -- particularly the ability to add, remove, or revise criteria. Open-ended responses highlighted a ``strictness-versus-detail'' trade-off: AI feedback was often perceived as harsher but more detailed and scalable. As a result, teachers expressed conditional willingness to adopt AI rubric tools when workflows support easy customization and preserve teacher control.
翻译:本研究探讨了K-12教师在暑期专业发展工作坊($n = 25$)中对AI辅助评分量规生成的认知与体验。教师使用MagicSchool.ai生成评分量规,并通过提示工程练习定制评估标准与表现等级。随后,他们将这些量规应用于对示例积木式编程活动的反馈,并进一步使用聊天机器人基于同一作品提供量规化反馈。数据通过工作坊前后问卷、开放式讨论及离场反馈单收集。我们采用主题分析法对质性数据进行分析。教师反馈称,他们很少从头创建评分量规,因为该过程耗时且难以清晰界定不同表现等级间的区别。经过实践操作后,教师们认为AI生成的评分量规是优秀的初始草案,能改善结构并澄清模糊标准。然而,他们强调需要教师监督,因为AI生成的语言存在通用性、与年级要求不符、偶尔与教学重点不一致等问题,且通常需要大量编辑。调查结果显示,教师对AI生成量规的清晰度与伦理可接受度评价较高,与作业的匹配度中等,而可用性——尤其是添加、删除或修订标准的功能——被视为主要弱点。开放式回答突显了“严格度与细节度”的权衡:AI反馈常被认为更严格,但更详细且可扩展。因此,教师表达了有条件采用AI评分量规工具的意愿,前提是工作流程能支持便捷定制并保持教师的主导权。