Current multilingual semantic parsing (MSP) datasets are almost all collected by translating the utterances in the existing datasets from the resource-rich language to the target language. However, manual translation is costly. To reduce the translation effort, this paper proposes the first active learning procedure for MSP (AL-MSP). AL-MSP selects only a subset from the existing datasets to be translated. We also propose a novel selection method that prioritizes the examples diversifying the logical form structures with more lexical choices, and a novel hyperparameter tuning method that needs no extra annotation cost. Our experiments show that AL-MSP significantly reduces translation costs with ideal selection methods. Our selection method with proper hyperparameters yields better parsing performance than the other baselines on two multilingual datasets.
翻译:当前的多语言语义解析(MSP)数据集几乎都是通过将现有资源丰富语言数据集中的话语翻译成目标语言来收集的。然而,手动翻译成本高昂。为降低翻译工作量,本文首次提出了面向MSP的主动学习流程(AL-MSP)。AL-MSP仅从现有数据集中选择子集进行翻译。我们还提出了一种新颖的选择方法,优先选择能通过更多词汇选择来丰富逻辑形式结构的示例,以及一种无需额外标注成本的新型超参数调优方法。实验表明,AL-MSP通过理想的选择方法显著降低了翻译成本。在两组多语言数据集上,采用适当超参数的我们所提选择方法相比其他基线方法取得了更优的解析性能。