Recently it has been shown that without any access to the contextual passage, multiple choice reading comprehension (MCRC) systems are able to answer questions significantly better than random on average. These systems use their accumulated "world knowledge" to directly answer questions, rather than using information from the passage. This paper examines the possibility of exploiting this observation as a tool for test designers to ensure that the use of "world knowledge" is acceptable for a particular set of questions. We propose information-theory based metrics that enable the level of "world knowledge" exploited by systems to be assessed. Two metrics are described: the expected number of options, which measures whether a passage-free system can identify the answer a question using world knowledge; and the contextual mutual information, which measures the importance of context for a given question. We demonstrate that questions with low expected number of options, and hence answerable by the shortcut system, are often similarly answerable by humans without context. This highlights that the general knowledge 'shortcuts' could be equally used by exam candidates, and that our proposed metrics may be helpful for future test designers to monitor the quality of questions.
翻译:近期研究表明,在完全不提供上下文段落的情况下,多项选择阅读理解系统仍能平均显著高于随机水平地回答问题。这些系统利用其积累的"世界知识"直接作答,而非依赖段落信息。本文探讨将这一发现作为测试设计工具的可能性,以确保特定试题集对"世界知识"的利用处于可接受范围。我们提出基于信息论的度量指标,用以评估系统所利用"世界知识"的程度。具体描述两种度量标准:其一为期望选项数,用于衡量无段落系统能否通过世界知识识别试题答案;其二为上下文互信息,用于衡量给定试题中上下文信息的重要性。研究证明,期望选项数较低的试题(即捷径系统可解答的题目)在无上下文情况下人类同样能正确作答。这表明通用知识"捷径"可被应试者同等利用,而我们提出的度量指标或有助于未来测试设计者监控试题质量。