Automated Evaluation of Classroom Instructional Support with LLMs and BoWs: Connecting Global Predictions to Specific Feedback

With the aim to provide teachers with more specific, frequent, and actionable feedback about their teaching, we explore how Large Language Models (LLMs) can be used to estimate ``Instructional Support'' domain scores of the CLassroom Assessment Scoring System (CLASS), a widely used observation protocol. We design a machine learning architecture that uses either zero-shot prompting of Meta's Llama2, and/or a classic Bag of Words (BoW) model, to classify individual utterances of teachers' speech (transcribed automatically using OpenAI's Whisper) for the presence of Instructional Support. Then, these utterance-level judgments are aggregated over an entire 15-min observation session to estimate a global CLASS score. Experiments on two CLASS-coded datasets of toddler and pre-kindergarten classrooms indicate that (1) automatic CLASS Instructional Support estimation accuracy using the proposed method (Pearson $R$ up to $0.47$) approaches human inter-rater reliability (up to $R=0.55$); (2) LLMs yield slightly greater accuracy than BoW for this task, though the best models often combined features extracted from both LLM and BoW; and (3) for classifying individual utterances, there is still room for improvement of automated methods compared to human-level judgments. Finally, (4) we illustrate how the model's outputs can be visualized at the utterance level to provide teachers with explainable feedback on which utterances were most positively or negatively correlated with specific CLASS dimensions.

翻译：为向教师提供更具体、更频繁且更具可操作性的教学反馈，本研究探索如何利用大语言模型(LLMs)估计课堂评估评分系统(CLASS)——一种广泛使用的观察协议——中"教学支持"领域的评分。我们设计了一种机器学习架构，采用Meta的Llama2零样本提示和/或经典词袋模型(BoW)，对教师言语中单个话语(通过OpenAI的Whisper自动转录)进行教学支持存在性分类。随后，将这些话语级判断聚合到整个15分钟观察时段，以估计全局CLASS评分。在两组针对幼儿和学前班课堂的CLASS编码数据集上的实验表明：(1)使用该方法自动估计CLASS教学支持评分的准确性(Pearson相关系数$R$高达$0.47$)接近人类评分者间信度(高达$R=0.55$);(2)在此任务中，LLMs的准确性略高于BoW，但最佳模型通常结合了LLM和BoW提取的特征;(3)在单句话语分类方面，自动化方法相比人工判断仍有改进空间。最后，(4)我们展示了如何将模型输出以话语级可视化呈现，为教师提供关于哪些话语与特定CLASS维度正/负相关性最强的可解释反馈。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日