Compositional generalization allows efficient learning and human-like inductive biases. Since most research investigating compositional generalization in NLP is done on English, important questions remain underexplored. Do the necessary compositional generalization abilities differ across languages? Can models compositionally generalize cross-lingually? As a first step to answering these questions, recent work used neural machine translation to translate datasets for evaluating compositional generalization in semantic parsing. However, we show that this entails critical semantic distortion. To address this limitation, we craft a faithful rule-based translation of the MCWQ dataset from English to Chinese and Japanese. Even with the resulting robust benchmark, which we call MCWQ-R, we show that the distribution of compositions still suffers due to linguistic divergences, and that multilingual models still struggle with cross-lingual compositional generalization. Our dataset and methodology will be useful resources for the study of cross-lingual compositional generalization in other tasks.
翻译:组合泛化能力支持高效学习与人类感知的归纳偏差。由于自然语言处理中关于组合泛化的研究大多基于英语,许多重要问题尚未充分探索:不同语言所需的组合泛化能力是否存在差异?模型能否实现跨语言组合泛化?作为回答这些问题的第一步,已有研究利用神经机器翻译方法对语义解析中用于评估组合泛化的数据集进行翻译。然而,我们发现这一方法会导致严重的语义失真。为解决此局限,我们基于规则将MCWQ数据集从英语忠实翻译为中文和日语。即便使用由此构建的鲁棒基准(称为MCWQ-R),我们仍发现组合分布的偏差源于语言差异,且多语言模型在跨语言组合泛化中仍面临挑战。本数据集与方法论将为其他任务中跨语言组合泛化的研究提供重要资源。