Bagging-Based Model Merging for Robust General Text Embeddings

General-purpose text embedding models underpin a wide range of NLP and information retrieval applications, and are typically trained on large-scale multi-task corpora to encourage broad generalization. However, it remains unclear how different multi-task training strategies compare in practice, and how to efficiently adapt embedding models as new domains and data types continually emerge. In this work, we present a systematic study of multi-task training for text embeddings from two perspectives: data scheduling and model merging. We compare batch-level shuffling, sequential training variants, two-stage training, and multiple merging granularities, and find that simple batch-level shuffling consistently yields the strongest overall performance, suggesting that task conflicts are limited and training datasets are largely complementary. Despite its effectiveness, batch-level shuffling exhibits two practical limitations: suboptimal out-of-domain (OOD) generalization and poor suitability for incremental learning due to expensive full retraining. To address these issues, we propose Bagging-based rObust mOdel Merging (\modelname), which trains multiple embedding models on sampled subsets and merges them into a single model, improving robustness while retaining single-model inference efficiency. Moreover, \modelname naturally supports efficient incremental updates by training lightweight update models on new data with a small historical subset and merging them into the existing model. Experiments across diverse embedding benchmarks demonstrate that \modelname consistently improves both in-domain and OOD performance over full-corpus batch-level shuffling, while substantially reducing training cost in incremental learning settings.

翻译：通用文本嵌入模型支撑着广泛的自然语言处理和信息检索应用，通常通过在大规模多任务语料上进行训练以促进广泛的泛化能力。然而，不同多任务训练策略在实际中的比较效果仍不明确，且随着新领域和数据类型的不断涌现，如何高效地适应嵌入模型仍是一个挑战。本研究从数据调度和模型融合两个角度，对文本嵌入的多任务训练进行了系统性研究。我们比较了批次级混洗、顺序训练变体、两阶段训练以及多种融合粒度，发现简单的批次级混洗始终能产生最强的整体性能，这表明任务冲突有限且训练数据集在很大程度上是互补的。尽管有效，批次级混洗存在两个实际局限性：域外泛化能力欠佳，以及由于昂贵的完全重新训练而不适合增量学习。为解决这些问题，我们提出了基于Bagging的鲁棒模型融合方法（\modelname），该方法通过在采样子集上训练多个嵌入模型并将其融合为单一模型，在保持单模型推理效率的同时提升了鲁棒性。此外，\modelname 通过在新数据上结合少量历史子集训练轻量级更新模型，并将其融合到现有模型中，自然支持高效的增量更新。在多样化的嵌入基准测试上的实验表明，与全语料批次级混洗相比，\modelname 在域内和域外性能上均能持续提升，同时在增量学习场景中显著降低了训练成本。