Fine-tuning a task-specific multilingual large language model (LLM) involves training the model on a multilingual dataset with examples in all the required languages. Updating one or more supported languages with additional data or adding support for a new language involves retraining the model, which can be computationally inefficient and creates a severe maintenance bottleneck. Recent research on merging multilingual multitask models has shown promise in terms of improved quality, but its computational and maintenance efficiency remains unstudied. In this work, we provide the first focused analysis of this merging strategy from an efficiency perspective, evaluating it across three independent tasks. We demonstrate significant efficiency gains while maintaining parity in terms of quality: this merging approach reduces the initial training time by up to 50\%. We also demonstrate that updating an individual language and re-merging as part of model maintenance reduces training costs by more than 60\%, compared to re-training the full multilingual model. We show this on both public and proprietary industry datasets confirming that the approach works well for industrial use cases in addition to academic settings already studied in previous work.
翻译:微调任务特定的多语言大语言模型(LLM)需要在包含所有目标语言示例的多语言数据集上训练模型。若需使用额外数据更新一个或多个支持的语言,或为新增语言提供支持,则必须重新训练模型,这会导致计算效率低下并形成严重的维护瓶颈。近期关于融合多语言多任务模型的研究在质量提升方面显示出潜力,但其计算效率与维护效率尚未得到充分研究。在本工作中,我们首次从效率视角对此融合策略进行聚焦分析,并在三个独立任务中开展评估。我们在保持质量相当的前提下证明了显著的效率提升:该融合方法将初始训练时间最高减少50%。同时,我们证明在模型维护过程中更新单个语言并重新融合,与重新训练完整多语言模型相比,可降低超过60%的训练成本。我们在公开数据集和行业专有数据集上均验证了该结果,证实该方法不仅适用于先前研究中已探讨的学术场景,同样能有效服务于工业应用场景。