Few-shot learning for open domain multi-hop question answering typically relies on the incontext learning capability of large language models (LLMs). While powerful, these LLMs usually contain tens or hundreds of billions of parameters, making them rather inefficient at inference time. To improve performance of smaller language models, we propose a data synthesis framework for multi-hop question answering that requires less than 10 human annotated question answer pairs. Our framework depends only on rich, naturally-occurring relationships among documents and is built upon the data generation functions parameterized by LLMs and prompts. We synthesize millions of multi-hop questions and claims to finetune language models, evaluated on popular benchmarks for multi-hop question answering and fact verification. Empirically, our approach improves model performance significantly, allowing the finetuned models to be competitive with GPT-3.5 based approaches while being almost one-third the size in parameter count.
翻译:少样本学习在开放域多跳问答任务中通常依赖大型语言模型(LLMs)的上下文学习能力。尽管性能强大,但这类LLMs通常包含数百至数千亿参数,导致推理效率较低。为提升小型语言模型的性能,我们提出一种面向多跳问答的数据合成框架,仅需少于10个人工标注的问答对。该框架仅依赖于文档间丰富的自然关联,并基于LLM驱动的数据生成函数与提示模板构建。我们合成了百万级的多跳问题与断言,用于微调语言模型,并在多跳问答与事实验证的通用基准测试上评估效果。实验表明,该方法显著提升了模型性能,使微调后的模型在参数规模仅为前者三分之一的情况下,性能可与基于GPT-3.5的方法相媲美。