Counterfactuals refer to minimally edited inputs that cause a model's prediction to change, serving as a promising approach to explaining the model's behavior. Large language models (LLMs) excel at generating English counterfactuals and demonstrate multilingual proficiency. However, their effectiveness in generating multilingual counterfactuals remains unclear. To this end, we conduct a comprehensive study on multilingual counterfactuals. We first conduct automatic evaluations on both directly generated counterfactuals in the target languages and those derived via English translation across six languages. Although translation-based counterfactuals offer higher validity than their directly generated counterparts, they demand substantially more modifications and still fall short of matching the quality of the original English counterfactuals. Second, we find the patterns of edits applied to high-resource European-language counterfactuals to be remarkably similar, suggesting that cross-lingual perturbations follow common strategic principles. Third, we identify and categorize four main types of errors that consistently appear in the generated counterfactuals across languages. Finally, we reveal that multilingual counterfactual data augmentation (CDA) yields larger model performance improvements than cross-lingual CDA, especially for lower-resource languages. Yet, the imperfections of the generated counterfactuals limit gains in model performance and robustness.
翻译:反事实指能够引起模型预测改变的最小编辑输入,是解释模型行为的一种有效方法。大语言模型在生成英语反事实样本方面表现优异,并展现出多语言能力。然而,其在生成多语言反事实样本方面的有效性尚不明确。为此,我们开展了关于多语言反事实样本的综合研究。首先,我们对六种语言的目标语言直接生成的反事实样本及通过英语翻译衍生的反事实样本进行了自动评估。尽管基于翻译的反事实样本比直接生成的样本具有更高的有效性,但其需要更多的修改,且仍无法达到原始英语反事实样本的质量水平。其次,我们发现应用于高资源欧洲语言反事实样本的编辑模式具有高度相似性,表明跨语言扰动遵循共同的策略原则。第三,我们识别并分类了跨语言生成反事实样本中持续出现的四类主要错误。最后,我们发现多语言反事实数据增强比跨语言反事实数据增强能带来更大的模型性能提升,尤其对于低资源语言而言。然而,生成反事实样本的缺陷限制了模型性能和鲁棒性的提升空间。