Building Efficient and Effective OpenQA Systems for Low-Resource Languages

Question answering (QA) is the task of answering questions posed in natural language with free-form natural language answers extracted from a given passage. In the OpenQA variant, only a question text is given, and the system must retrieve relevant passages from an unstructured knowledge source and use them to provide answers, which is the case in the mainstream QA systems on the Web. QA systems currently are mostly limited to the English language due to the lack of large-scale labeled QA datasets in non-English languages. In this paper, we show that effective, low-cost OpenQA systems can be developed for low-resource languages. The key ingredients are (1) weak supervision using machine-translated labeled datasets and (2) a relevant unstructured knowledge source in the target language. Furthermore, we show that only a few hundred gold assessment examples are needed to reliably evaluate these systems. We apply our method to Turkish as a challenging case study, since English and Turkish are typologically very distinct. We present SQuAD-TR, a machine translation of SQuAD2.0, and we build our OpenQA system by adapting ColBERT-QA for Turkish. We obtain a performance improvement of 9-34% in the EM score and 13-33% in the F1 score compared to the BM25-based and DPR-based baseline QA reader models by using two versions of Wikipedia dumps spanning two years. Our results show that SQuAD-TR makes OpenQA feasible for Turkish, which we hope encourages researchers to build OpenQA systems in other low-resource languages. We make all the code, models, and the dataset publicly available.

翻译：问答（QA）是一项根据给定段落，从自然语言形式的问题中抽取自由形式的自然语言答案的任务。在开放问答（OpenQA）变体中，仅提供问题文本，系统需从非结构化知识源中检索相关段落并据此提供答案，这正是当前主流网络问答系统的应用场景。目前，由于非英语语言缺乏大规模标注的问答数据集，问答系统主要局限于英语。本文证明，针对低资源语言，可以开发出高效且低成本的有效开放问答系统。关键要素包括：（1）利用机器翻译的标注数据集进行弱监督学习；（2）拥有目标语言中相关的非结构化知识源。此外，我们证明仅需数百条人工标注的评估样本即可可靠地评估这些系统。我们以土耳其语作为挑战性案例研究，因其与英语在类型学上差异显著。我们提出SQuAD-TR（SQuAD2.0的机器翻译版本），并通过将ColBERT-QA适配至土耳其语来构建开放问答系统。基于跨度两年的两个维基百科转储版本，相较于基于BM25和DPR的基线问答阅读器模型，我们在精确匹配（EM）分数上获得9-34%的提升，在F1分数上获得13-33%的提升。结果表明，SQuAD-TR使土耳其语的开放问答成为可能，这有望激励研究者在其他低资源语言中构建开放问答系统。我们将所有代码、模型和数据集公开发布。