Item difficulty plays a crucial role in adaptive testing. However, few works have focused on generating questions of varying difficulty levels, especially for multiple-choice (MC) cloze tests. We propose training pre-trained language models (PLMs) as surrogate models to enable item response theory (IRT) assessment, avoiding the need for human test subjects. We also propose two strategies to control the difficulty levels of both the gaps and the distractors using ranking rules to reduce invalid distractors. Experimentation on a benchmark dataset demonstrates that our proposed framework and methods can effectively control and evaluate the difficulty levels of MC cloze tests.
翻译:题目难度在自适应测试中起着关键作用。然而,目前很少有研究关注如何生成不同难度的题目,尤其是针对多选题(MC)完形填空测试。我们提出训练预训练语言模型(PLM)作为代理模型,以实现项目反应理论(IRT)评估,从而避免需要人类受试者。我们还提出了两种策略,通过使用排序规则控制空格与干扰项的难度级别,以减少无效干扰项。在基准数据集上的实验表明,我们提出的框架和方法能够有效控制和评估多选题完形填空测试的难度水平。