In this paper, we focus on task-specific question answering (QA). To this end, we introduce a method for generating exhaustive and high-quality training data, which allows us to train compact (e.g., run on a mobile device), task-specific QA models that are competitive against GPT variants. The key technological enabler is a novel mechanism for automatic question-answer generation from procedural text which can ingest large amounts of textual instructions and produce exhaustive in-domain QA training data. While current QA data generation methods can produce well-formed and varied data, their non-exhaustive nature is sub-optimal for training a QA model. In contrast, we leverage the highly structured aspect of procedural text and represent each step and the overall flow of the procedure as graphs. We then condition on graph nodes to automatically generate QA pairs in an exhaustive and controllable manner. Comprehensive evaluations of our method show that: 1) small models trained with our data achieve excellent performance on the target QA task, even exceeding that of GPT3 and ChatGPT despite being several orders of magnitude smaller. 2) semantic coverage is the key indicator for downstream QA performance. Crucially, while large language models excel at syntactic diversity, this does not necessarily result in improvements on the end QA model. In contrast, the higher semantic coverage provided by our method is critical for QA performance.
翻译:本文聚焦于任务特定的问答系统。为此,我们提出了一种生成详尽且高质量训练数据的方法,从而能够训练出紧凑(例如可在移动设备上运行)且性能可媲美GPT变体的任务特定问答模型。关键技术突破在于一种从过程性文本中自动生成问答对的新机制,该机制可处理大量文本指令并生成领域内完备的问答训练数据。虽然现有问答数据生成方法能够产出形式良好且多样化的数据,但其非穷举特性对于训练问答模型而言并非最优。相比之下,我们利用过程性文本高度结构化的特点,将每个步骤及整体流程表示为图结构。随后,我们以图节点为条件,以穷尽且可控的方式自动生成问答对。综合评估表明:1)使用我们的数据训练的小型模型在目标问答任务上表现出色,尽管规模小数个数量级,其性能甚至超越GPT3和ChatGPT;2)语义覆盖率是下游问答性能的关键指标。值得注意的是,虽然大语言模型擅长生成句法多样的文本,但这并不必然带来最终问答模型的性能提升。相反,我们的方法所实现的更高语义覆盖率对问答性能具有决定性作用。