In an educational setting, an estimate of the difficulty of multiple-choice questions (MCQs), a commonly used strategy to assess learning progress, constitutes very useful information for both teachers and students. Since human assessment is costly from multiple points of view, automatic approaches to MCQ item difficulty estimation are investigated, yielding however mixed success until now. Our approach to this problem takes a different angle from previous work: asking various Large Language Models to tackle the questions included in two different MCQ datasets, we leverage model uncertainty to estimate item difficulty. By using both model uncertainty features as well as textual features in a Random Forest regressor, we show that uncertainty features contribute substantially to difficulty prediction, where difficulty is inversely proportional to the number of students who can correctly answer a question. In addition to showing the value of our approach, we also observe that our model achieves state-of-the-art results on the BEA publicly available dataset.
翻译:在教育环境中,多项选择题(MCQs)作为评估学习进展的常用策略,其难度估计对教师和学生都具有重要价值。由于人工评估在多个层面成本高昂,学界开始探索MCQ试题难度自动评估方法,但迄今成效参差。本研究采用与以往工作不同的视角:通过要求多种大型语言模型处理两个不同MCQ数据集中的试题,我们利用模型不确定性来评估试题难度。通过在随机森林回归器中同时使用模型不确定性特征与文本特征,我们证明不确定性特征对难度预测具有显著贡献,其中难度与能正确回答问题的学生数量成反比。除了验证本方法的有效性外,我们还观察到该模型在公开可用的BEA数据集上取得了最先进的性能。