Pretrained language models (PLMs) have shown remarkable generalization toward multiple tasks and languages. Nonetheless, the generalization of PLMs towards unseen languages is poor, resulting in significantly worse language performance, or even generating nonsensical responses that are comparable to a random baseline. This limitation has been a longstanding problem of PLMs raising the problem of diversity and equal access to language modeling technology. In this work, we solve this limitation by introducing LinguAlchemy, a regularization technique that incorporates various aspects of languages covering typological, geographical, and phylogenetic constraining the resulting representation of PLMs to better characterize the corresponding linguistics constraints. LinguAlchemy significantly improves the accuracy performance of mBERT and XLM-R on unseen languages by ~18% and ~2%, respectively compared to fully finetuned models and displaying a high degree of unseen language generalization. We further introduce AlchemyScale and AlchemyTune, extension of LinguAlchemy which adjusts the linguistic regularization weights automatically, alleviating the need for hyperparameter search. LinguAlchemy enables better cross-lingual generalization to unseen languages which is vital for better inclusivity and accessibility of PLMs.
翻译:预训练语言模型(PLMs)在多项任务和语言上展现出卓越的泛化能力,但面向未见语言的泛化性能却表现不佳,不仅导致语言处理效果显著下降,甚至可能产生与随机基线相当的毫无意义的回复。这一局限长期制约着PLMs的发展,引发了语言多样性及语言建模技术平等获取的问题。本研究通过提出语言炼金术(LinguAlchemy)技术解决该问题,这是一种融合类型学、地理学和系统发育等多维度语言特征的规整化方法,通过约束PLMs的表示空间使其更准确地表征对应语言约束条件。实验表明,与完全微调模型相比,LinguAlchemy使mBERT和XLM-R在未见语言上的准确率分别提升约18%和2%,并展现出高水平的未见语言泛化能力。我们进一步提出LinguAlchemy的扩展方法AlchemyScale与AlchemyTune,通过自动调整语言规整化权重,免除了超参数搜索的需求。LinguAlchemy实现了对未见语言更优质的跨语言泛化,这对提升PLMs的包容性与可及性至关重要。