基于5个示例的少样本多语言开放域问答 (Few-Shot Multilingual Open-Domain QA from 5 Examples)

Recent approaches to multilingual open-domain question answering (MLODQA) have achieved promising results given abundant language-specific training data. However, the considerable annotation cost limits the application of these methods for underrepresented languages. We introduce a \emph{few-shot learning} approach to synthesise large-scale multilingual data from large language models (LLMs). Our method begins with large-scale self-supervised pre-training using WikiData, followed by training on high-quality synthetic multilingual data generated by prompting LLMs with few-shot supervision. The final model, \textsc{FsModQA}, significantly outperforms existing few-shot and supervised baselines in MLODQA and cross-lingual and monolingual retrieval. We further show our method can be extended for effective zero-shot adaptation to new languages through a \emph{cross-lingual prompting} strategy with only English-supervised data, making it a general and applicable solution for MLODQA tasks without costly large-scale annotation.

翻译：近年来，多语言开放域问答（MLODQA）方法在具备充足语言特定训练数据的情况下取得了有希望的结果。然而，高昂的标注成本限制了这些方法在资源匮乏语言上的应用。我们引入一种少样本学习方法，利用大语言模型（LLMs）合成大规模多语言数据。我们的方法首先使用WikiData进行大规模自监督预训练，随后在由LLMs通过少样本监督提示生成的高质量合成多语言数据上进行训练。最终模型 \textsc{FsModQA} 在MLODQA以及跨语言和单语言检索任务上显著优于现有的少样本和监督基线方法。我们进一步证明，通过仅使用英语监督数据的跨语言提示策略，我们的方法可以有效地扩展到对新语言的零样本适应，这使其成为一种无需昂贵大规模标注即可适用于MLODQA任务的通用且实用的解决方案。

相关内容

小样本学习

关注 216

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日