Equity is a core concern of learning analytics. However, applications that teach and assess equity skills, particularly at scale are lacking, often due to barriers in evaluating language. Advances in generative AI via large language models (LLMs) are being used in a wide range of applications, with this present work assessing its use in the equity domain. We evaluate tutor performance within an online lesson on enhancing tutors' skills when responding to students in potentially inequitable situations. We apply a mixed-method approach to analyze the performance of 81 undergraduate remote tutors. We find marginally significant learning gains with increases in tutors' self-reported confidence in their knowledge in responding to middle school students experiencing possible inequities from pretest to posttest. Both GPT-4o and GPT-4-turbo demonstrate proficiency in assessing tutors ability to predict and explain the best approach. Balancing performance, efficiency, and cost, we determine that few-shot learning using GPT-4o is the preferred model. This work makes available a dataset of lesson log data, tutor responses, rubrics for human annotation, and generative AI prompts. Future work involves leveling the difficulty among scenarios and enhancing LLM prompts for large-scale grading and assessment.
翻译:公平性是学习分析学的核心关切。然而,目前缺乏能够教授和评估公平性技能的应用,尤其是在大规模应用层面,这通常源于语言评估方面的障碍。基于大语言模型的生成式人工智能进展正被广泛应用于各种场景,本研究旨在评估其在公平性领域的应用。我们评估了导师在一门在线课程中的表现,该课程旨在提升导师在应对学生可能遭遇的不公平情境时的技能。我们采用混合方法分析了81名本科生远程导师的表现。研究发现,从前测到后测,导师在回应可能遭遇不公平待遇的中学生时,其自我报告的知识信心度有边际显著的学习增益。GPT-4o和GPT-4-turbo均展现出评估导师预测和解释最佳方法能力的熟练性。综合考虑性能、效率和成本,我们确定使用GPT-4o进行少样本学习是优选模型。本研究公开了一个包含课程日志数据、导师回答、人工标注规则和生成式人工智能提示的数据集。未来的工作包括平衡不同情境的难度,以及优化大语言模型提示以进行大规模评分和评估。