Creating multiple-choice questions to assess reading comprehension of a given article involves generating question-answer pairs (QAPs) and adequate distractors. We present two methods to tackle the challenge of QAP generations: (1) A deep-learning-based end-to-end question generation system based on T5 Transformer with Preprocessing and Postprocessing Pipelines (TP3). We use the finetuned T5 model for our downstream task of question generation and improve accuracy using a combination of various NLP tools and algorithms in preprocessing and postprocessing to select appropriate answers and filter undesirable questions. (2) A sequence-learning-based scheme to generate adequate QAPs via meta-sequence representations of sentences. A meta-sequence is a sequence of vectors comprising semantic and syntactic tags. we devise a scheme called MetaQA to learn meta sequences from training data to form pairs of a meta sequence for a declarative sentence and a corresponding interrogative sentence. The TP3 works well on unseen data, which is complemented by MetaQA. Both methods can generate well-formed and grammatically correct questions. Moreover, we present a novel approach to automatically generate adequate distractors for a given QAP. The method is a combination of part-of-speech tagging, named-entity tagging, semantic-role labeling, regular expressions, domain knowledge bases, word embeddings, word edit distance, WordNet, and other algorithms.
翻译:生成多项选择题以评估给定文章的阅读理解能力涉及问答对(QAP)的生成及合适干扰项的设计。我们提出两种方法应对QAP生成挑战:(1)基于T5 Transformer的深度学习端到端问题生成系统,包含预处理和后处理流水线(TP3)。我们使用微调后的T5模型执行下游问题生成任务,并通过在预处理和后处理中结合多种自然语言处理工具与算法,选择合适答案并过滤不符合要求的问题,从而提升准确率。(2)一种基于序列学习的方案,通过句子的元序列表示生成合适的QAP。元序列是由语义和句法标签构成的向量序列。我们设计了一种名为MetaQA的方案,从训练数据中学习元序列,形成陈述句与其对应疑问句的元序列对。TP3在未见数据上表现良好,并由MetaQA进行补充。两种方法均可生成结构完整且语法正确的问题。此外,我们提出一种新方法,能够自动为给定问答对生成合适的干扰项。该方法结合了词性标注、命名实体标注、语义角色标注、正则表达式、领域知识库、词嵌入、词语编辑距离、WordNet及其他算法。