Existing large language models (LLMs) that mainly focus on Standard American English (SAE) often lead to significantly worse performance when being applied to other English dialects. While existing mitigations tackle discrepancies for individual target dialects, they assume access to high-accuracy dialect identification systems. The boundaries between dialects are inherently flexible, making it difficult to categorize language into discrete predefined categories. In this paper, we propose DADA (Dialect Adaptation via Dynamic Aggregation), a modular approach to imbue SAE-trained models with multi-dialectal robustness by composing adapters which handle specific linguistic features. The compositional architecture of DADA allows for both targeted adaptation to specific dialect variants and simultaneous adaptation to various dialects. We show that DADA is effective for both single task and instruction finetuned language models, offering an extensible and interpretable framework for adapting existing LLMs to different English dialects.
翻译:现有主要关注标准美式英语的大型语言模型(LLMs)在应用于其他英语方言时,其性能往往会显著下降。现有缓解方案虽能针对个体目标方言处理差异问题,但其前提是可以使用高准确度的方言识别系统。方言之间的边界本质上具有灵活性,这使得将语言划分为离散的预定义类别变得困难。本文提出DADA(基于动态聚合的方言适应方法)——一种模块化方法,通过组合处理特定语言学特征的适配器,为经标准美式英语训练的模型赋予多方言鲁棒性。DADA的组合式架构既支持针对特定方言变体的定向适应,也支持同时适应多种方言。我们证明,DADA对单任务和指令微调语言模型均有效,为将现有大型语言模型适配至不同英语方言提供了可扩展且可解释的框架。