Multiple Choice examinations are a ubiquitous form of assessment that is used to measure the ability of candidates across various domains and tasks. Maintaining the quality of proposed questions is of great importance to test designers, and therefore newly proposed questions go through several pre-test evaluation stages before they can be deployed into real-world exams. This process is currently quite manual, which can lead to time lags in the question development cycle. Automating this process would lead to a large improvement in efficiency, however, current datasets do not contain sufficient pre-test analysis information. In this paper, we introduce CamChoice; a multiple-choice comprehension dataset with questions at different target levels, where questions have the true candidate selected options distributions. We introduce the task of candidate distribution matching, propose several evaluation metrics for the task, and demonstrate that automatic systems trained on RACE++ can be leveraged as baselines for our task. We further demonstrate that these automatic systems can be used for practical pre-test evaluation tasks such as detecting underperforming distractors, where our detection systems can automatically identify poor distractors that few candidates select. We release the data publicly for future research.
翻译:多项选择考试是一种普遍应用的评估形式,用于衡量候选人在不同领域和任务中的能力。维持所提问题的质量对测试设计者至关重要,因此新提出的问题在部署到真实考试前需经过多个预测试评估阶段。当前这一过程高度依赖人工操作,可能导致问题开发周期中的时间延迟。自动化这一流程将大幅提升效率,然而现有数据集并未包含充分的预测试分析信息。本文介绍了CamChoice;一个包含不同目标层级问题的多项选择理解数据集,其中每个问题均附有真实候选选项选择分布。我们提出了候选分布匹配任务,为该任务设计了若干评估指标,并证明了基于RACE++训练的自动化系统可作为该任务的基线方法。我们进一步展示了这些自动化系统可用于实用预测试评估任务(如检测表现不佳的干扰项),我们的检测系统可自动识别候选者极少选择的低效干扰项。我们公开发布该数据集以供未来研究使用。