In this work, we induce character-level noise in various forms when fine-tuning BERT to enable zero-shot cross-lingual transfer to unseen dialects and languages. We fine-tune BERT on three sentence-level classification tasks and evaluate our approach on an assortment of unseen dialects and languages. We find that character-level noise can be an extremely effective agent of cross-lingual transfer under certain conditions, while it is not as helpful in others. Specifically, we explore these differences in terms of the nature of the task and the relationships between source and target languages, finding that introduction of character-level noise during fine-tuning is particularly helpful when a task draws on surface level cues and the source-target cross-lingual pair has a relatively high lexical overlap with shorter (i.e., less meaningful) unseen tokens on average.
翻译:本文在微调BERT时引入多种形式的字符级噪声,以实现向未见方言和语言的零样本跨语言迁移。我们针对三个句子级分类任务对BERT进行微调,并在多种未见方言和语言上评估该方法。研究发现,字符级噪声在特定条件下能成为极其有效的跨语言迁移手段,而在其他情况下则效果有限。具体而言,我们从任务性质以及源语言与目标语言关系的角度探讨这些差异,发现当任务依赖于表层线索、且源-目标跨语言对具有较高的词汇重叠度、平均包含较短(即信息量较少)的未见标记时,在微调过程中引入字符级噪声尤其有效。