We study model merging as a practical alternative to conventional adaptation strategies for code-mixed NLP. Starting from a multilingual base model, we: (i) perform continued pre-training (CPT) on unlabeled code-mixed text to obtain an adapted checkpoint, (ii) merge checkpoint with the base model, and (iii) fine-tune (FT) on the downstream task data. We evaluate our approach for sentence classification (sentiment and hate speech) task in English-Hindi (En-Hi) and English-Spanish (En-Es) using XLM-R and Llama-3.2-1B models. Our results show that merged models consistently outperform full fine-tuning and CPT->FT. We observe gains of 2--5 points in F1 over full fine-tuning and ~1-2 points over CPT->FT, indicating that unlabeled data is leveraged more effectively via merging than via CPT alone. Zero-/few-shot prompting with larger LLMs (e.g., Llama-3.3-70B) lags behind fine-tuned and merged checkpoints, underscoring limits of in-context learning for code-mixed inputs. We further test cross-pair transfer by training on En-Hi and evaluating on En-Ta and En-Ml: merged checkpoints transfer more strongly than monolingual-English baselines (e.g., TV/TIES variants reaching 0.65-0.68 F1 vs 0.61-0.63 for full fine-tuning), suggesting that code-mixed knowledge is a more reliable substrate for low-resource pairs. We conclude with adaptation recipes matched to common data regimes (labeled only; labeled+unlabeled; transfer-only) and discuss limitations and scaling considerations for broader tasks and larger models.
翻译:我们研究模型融合作为语码混合自然语言处理传统适配策略的一种实用替代方案。我们从多语言基础模型出发,依次进行以下操作:(i) 在未标注的语码混合文本上进行持续预训练,获得适配后的检查点;(ii) 将该检查点与基础模型融合;(iii) 在下游任务数据上进行微调。我们使用 XLM-R 和 Llama-3.2-1B 模型,在英语-印地语和英语-西班牙语的句子分类任务上评估了我们的方法。结果表明,融合模型始终优于完全微调和持续预训练后微调方案。相较于完全微调,融合模型在 F1 分数上获得了 2-5 个百分点的提升;相较于持续预训练后微调,提升了约 1-2 个百分点,这表明通过融合能比仅通过持续预训练更有效地利用未标注数据。使用更大规模语言模型的零样本/少样本提示方法表现落后于微调和融合后的检查点,凸显了上下文学习在处理语码混合输入时的局限性。我们进一步测试了跨语言对的迁移能力:在英语-印地语数据上训练,并在英语-泰米尔语和英语-马拉雅拉姆语数据上评估。融合检查点展现出比单语英语基线更强的迁移能力,表明语码混合知识是低资源语言对更可靠的迁移基础。最后,我们针对常见的数据场景提出了相应的适配方案,并讨论了该方法在更广泛任务和更大模型上的局限性与扩展考量。