Generative AI offers new opportunities for individualized and adaptive learning, e.g., through large language model (LLM)-based feedback systems. While LLMs can produce effective feedback for relatively straightforward conceptual tasks, delivering high-quality feedback for tasks that require advanced domain expertise, such as physics problem solving, remains a substantial challenge. This study presents the design of an LLM-based feedback system for physics problem solving grounded in evidence-centered design (ECD) and evaluates its performance within the German Physics Olympiad. Participants assessed the usefulness and accuracy of the generated feedback, which was generally perceived as useful and highly accurate. However, an in-depth analysis revealed that the feedback contained errors in 20% of cases; errors that often went unnoticed by the students. We discuss the risks associated with uncritical reliance on LLM-based feedback and outline potential directions for generating more adaptive and reliable LLM-based feedback in the future.
翻译:生成式人工智能为个性化和自适应学习提供了新机遇,例如通过基于大语言模型的反馈系统。虽然大语言模型能为相对简单的概念性任务生成有效反馈,但在需要高级领域专长的任务(如物理问题解决)中提供高质量反馈仍是一项重大挑战。本研究基于证据中心设计,提出了一种用于物理问题解决的大语言模型反馈系统,并在德国物理奥林匹克竞赛中评估了其性能。参与者评估了所生成反馈的有用性和准确性,该反馈普遍被认为有用且高度准确。然而,深入分析显示,20%的案例中反馈包含错误,而这些错误往往被学生忽视。我们讨论了盲目依赖大语言模型反馈的相关风险,并概述了未来生成更具适应性和可靠性的大语言模型反馈的可能方向。