Dialect adapters that improve the performance of LLMs for NLU tasks on certain sociolects/dialects/national varieties ('dialects' for the sake of brevity) have been reported for encoder models. In this paper, we extend the idea of dialect adapters to decoder models in our architecture called LoRDD. Using MD-3, a publicly available dataset of word game-playing conversations between dialectal speakers, our task is Target Word Prediction (TWP) from a masked conversation. LoRDD combines task adapters and dialect adapters where the latter employ contrastive learning on pseudo-parallel conversations from MD-3. Our experiments on Indian English and Nigerian English conversations with two models (Mistral and Gemma) demonstrate that LoRDD outperforms four baselines on TWP. Additionally, it significantly reduces the performance gap with American English, narrowing it to 12% and 5.8% for word similarity, and 25% and 4.5% for accuracy, respectively. The focused contribution of LoRDD is in its promise for dialect adaptation of decoder models using TWP, a simplified version of the commonly used next-word prediction task.
翻译:方言适配器已被证实能够提升大型语言模型在特定社会方言/方言/国家变体(为简洁统称“方言”)自然语言理解任务上的性能,但目前相关研究主要集中于编码器模型。本文提出LoRDD架构,将方言适配器的思想扩展至解码器模型。我们使用公开数据集MD-3(包含方言使用者进行文字游戏的对话记录),以掩码对话中的目标词预测为任务。LoRDD融合了任务适配器与方言适配器,其中方言适配器基于MD-3中的伪平行对话进行对比学习。我们在印度英语和尼日利亚英语对话上使用Mistral和Gemma两种模型的实验表明,LoRDD在目标词预测任务上优于四种基线方法。此外,该方法显著缩小了与美国英语变体之间的性能差距:在词语相似度指标上将差距分别降低至12%与5.8%,在准确率指标上分别降低至25%与4.5%。LoRDD的核心贡献在于,通过目标词预测(即常用下一词预测任务的简化版本)为解码器模型的方言适配提供了可行路径。