通过模型融合实现多语言模型在语码混合任务上的适配 (Adapting Multilingual Models to Code-Mixed Tasks via Model Merging)

We study model merging as a practical alternative to conventional adaptation strategies for code-mixed NLP. Starting from a multilingual base model, we: (i) perform continued pre-training (CPT) on unlabeled code-mixed text to obtain an adapted checkpoint, (ii) merge checkpoint with the base model, and (iii) fine-tune (FT) on the downstream task data. We evaluate our approach for sentence classification (sentiment and hate speech) task in English-Hindi (En-Hi) and English-Spanish (En-Es) using XLM-R and Llama-3.2-1B models. Our results show that merged models consistently outperform full fine-tuning and CPT->FT. We observe gains of 2--5 points in F1 over full fine-tuning and ~1-2 points over CPT->FT, indicating that unlabeled data is leveraged more effectively via merging than via CPT alone. Zero-/few-shot prompting with larger LLMs (e.g., Llama-3.3-70B) lags behind fine-tuned and merged checkpoints, underscoring limits of in-context learning for code-mixed inputs. We further test cross-pair transfer by training on En-Hi and evaluating on En-Ta and En-Ml: merged checkpoints transfer more strongly than monolingual-English baselines (e.g., TV/TIES variants reaching 0.65-0.68 F1 vs 0.61-0.63 for full fine-tuning), suggesting that code-mixed knowledge is a more reliable substrate for low-resource pairs. We conclude with adaptation recipes matched to common data regimes (labeled only; labeled+unlabeled; transfer-only) and discuss limitations and scaling considerations for broader tasks and larger models.

翻译：我们研究模型融合作为语码混合自然语言处理传统适配策略的一种实用替代方案。我们从多语言基础模型出发，依次进行以下操作：(i) 在未标注的语码混合文本上进行持续预训练，获得适配后的检查点；(ii) 将该检查点与基础模型融合；(iii) 在下游任务数据上进行微调。我们使用 XLM-R 和 Llama-3.2-1B 模型，在英语-印地语和英语-西班牙语的句子分类任务上评估了我们的方法。结果表明，融合模型始终优于完全微调和持续预训练后微调方案。相较于完全微调，融合模型在 F1 分数上获得了 2-5 个百分点的提升；相较于持续预训练后微调，提升了约 1-2 个百分点，这表明通过融合能比仅通过持续预训练更有效地利用未标注数据。使用更大规模语言模型的零样本/少样本提示方法表现落后于微调和融合后的检查点，凸显了上下文学习在处理语码混合输入时的局限性。我们进一步测试了跨语言对的迁移能力：在英语-印地语数据上训练，并在英语-泰米尔语和英语-马拉雅拉姆语数据上评估。融合检查点展现出比单语英语基线更强的迁移能力，表明语码混合知识是低资源语言对更可靠的迁移基础。最后，我们针对常见的数据场景提出了相应的适配方案，并讨论了该方法在更广泛任务和更大模型上的局限性与扩展考量。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日