AI-Generated Rubric Interfaces: K-12 Teachers' Perceptions and Practices

This study investigates K--12 teachers' perceptions and experiences with AI-supported rubric generation during a summer professional development workshop ($n = 25$). Teachers used MagicSchool.ai to generate rubrics and practiced prompting to tailor criteria and performance levels. They then applied these rubrics to provide feedback on a sample block-based programming activity, followed by using a chatbot to deliver rubric-based feedback for the same work. Data were collected through pre- and post-workshop surveys, open discussions, and exit tickets. We used thematic analysis to analyze the qualitative data. Teachers reported that they rarely create rubrics from scratch because the process is time-consuming and defining clear distinctions between performance levels is challenging. After hands-on use, teachers described AI-generated rubrics as strong starting drafts that improved structure and clarified vague criteria. However, they emphasized the need for teacher oversight due to generic or grade-misaligned language, occasional misalignment with instructional priorities, and the need for substantial editing. Survey results indicated high perceived clarity and ethical acceptability, moderate alignment with assignments, and usability as the primary weakness -- particularly the ability to add, remove, or revise criteria. Open-ended responses highlighted a ``strictness-versus-detail'' trade-off: AI feedback was often perceived as harsher but more detailed and scalable. As a result, teachers expressed conditional willingness to adopt AI rubric tools when workflows support easy customization and preserve teacher control.

翻译：本研究探讨了K-12教师在暑期专业发展工作坊（$n = 25$）中对AI辅助评分量规生成的认知与体验。教师使用MagicSchool.ai生成评分量规，并通过提示工程练习定制评估标准与表现等级。随后，他们将这些量规应用于对示例积木式编程活动的反馈，并进一步使用聊天机器人基于同一作品提供量规化反馈。数据通过工作坊前后问卷、开放式讨论及离场反馈单收集。我们采用主题分析法对质性数据进行分析。教师反馈称，他们很少从头创建评分量规，因为该过程耗时且难以清晰界定不同表现等级间的区别。经过实践操作后，教师们认为AI生成的评分量规是优秀的初始草案，能改善结构并澄清模糊标准。然而，他们强调需要教师监督，因为AI生成的语言存在通用性、与年级要求不符、偶尔与教学重点不一致等问题，且通常需要大量编辑。调查结果显示，教师对AI生成量规的清晰度与伦理可接受度评价较高，与作业的匹配度中等，而可用性——尤其是添加、删除或修订标准的功能——被视为主要弱点。开放式回答突显了“严格度与细节度”的权衡：AI反馈常被认为更严格，但更详细且可扩展。因此，教师表达了有条件采用AI评分量规工具的意愿，前提是工作流程能支持便捷定制并保持教师的主导权。

相关内容

关注 7104

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

利用量规奖励训练 AI 共同科学家

专知会员服务

19+阅读 · 1月5日

《人-AI协作设计：统计量方法》最新77页

专知会员服务

27+阅读 · 2025年5月3日

AI教育的落地深度研究：复盘、对比和商业化

专知会员服务

16+阅读 · 2025年4月3日

CMU最新《生成式人工智能》课程，涵盖大模型最新技术

专知会员服务

101+阅读 · 2024年4月4日