Current multilingual semantic parsing (MSP) datasets are almost all collected by translating the utterances in the existing datasets from the resource-rich language to the target language. However, manual translation is costly. To reduce the translation effort, this paper proposes the first active learning procedure for MSP (AL-MSP). AL-MSP selects only a subset from the existing datasets to be translated. We also propose a novel selection method that prioritizes the examples diversifying the logical form structures with more lexical choices, and a novel hyperparameter tuning method that needs no extra annotation cost. Our experiments show that AL-MSP significantly reduces translation costs with ideal selection methods. Our selection method with proper hyperparameters yields better parsing performance than the other baselines on two multilingual datasets.
翻译:当前的多语言语义解析(MSP)数据集几乎全部通过将现有数据集中资源丰富语言的语句翻译成目标语言来构建。然而,人工翻译成本高昂。为降低翻译开销,本文首次提出面向MSP的主动学习流程(AL-MSP)。AL-MSP仅从现有数据集选择子集进行翻译。我们还提出了一种新颖的选择方法,该方法优先选择能通过更多词汇选择来丰富逻辑形式结构的示例,并设计了一种无需额外标注成本的超参数调优方法。实验表明,AL-MSP采用理想的选择方法能显著降低翻译成本。在两种多语言数据集上,我们的选择方法在合理超参数配置下的解析性能优于其他基线方法。