When evaluating a learner's knowledge proficiency, the multiple-choice question is an efficient and widely used format in standardized tests. Nevertheless, generating these questions, particularly plausible distractors (incorrect options), poses a considerable challenge. Generally, the distractor generation can be classified into cloze-style distractor generation (CDG) and natural questions distractor generation (NQDG). In contrast to the CDG, utilizing pre-trained language models (PLMs) for NQDG presents three primary challenges: (1) PLMs are typically trained to generate ``correct'' content, like answers, while rarely trained to generate ``plausible" content, like distractors; (2) PLMs often struggle to produce content that aligns well with specific knowledge and the style of exams; (3) NQDG necessitates the model to produce longer, context-sensitive, and question-relevant distractors. In this study, we introduce a fine-tuning framework named DGRC for NQDG in Chinese multi-choice reading comprehension from authentic examinations. DGRC comprises three major components: hard chain-of-thought, multi-task learning, and generation mask patterns. The experiment results demonstrate that DGRC significantly enhances generation performance, achieving a more than 2.5-fold improvement in BLEU scores.
翻译:在评估学习者知识掌握程度时,选择题是标准化测试中一种高效且广泛使用的题型。然而,生成此类题目,尤其是看似合理的干扰项(错误选项),构成了重大挑战。通常,干扰项生成可分为完形填空式干扰项生成(CDG)与自然问题干扰项生成(NQDG)。相较于CDG,利用预训练语言模型(PLMs)进行NQDG面临三个主要挑战:(1)PLMs通常被训练用于生成“正确”内容(如答案),而极少被训练生成“看似合理”的内容(如干扰项);(2)PLMs往往难以生成与特定知识及考试风格高度契合的内容;(3)NQDG要求模型生成更长、与上下文相关且与问题匹配的干扰项。本研究针对真实考试中的中文阅读理解选择题,提出了一种名为DGRC的NQDG微调框架。DGRC包含三大核心组件:硬性思维链、多任务学习与生成掩码模式。实验结果表明,DGRC显著提升了生成性能,在BLEU分数上实现了超过2.5倍的改进。