Recent works show that assembling multiple off-the-shelf large language models (LLMs) can harness their complementary abilities. To achieve this, routing is a promising method, which learns a router to select the most suitable LLM for each query. However, existing routing models are ineffective when multiple LLMs perform well for a query. To address this problem, in this paper, we propose a method called query-based Router by Dual Contrastive learning (RouterDC). The RouterDC model consists of an encoder and LLM embeddings, and we propose two contrastive learning losses to train the RouterDC model. Experimental results show that RouterDC is effective in assembling LLMs and largely outperforms individual top-performing LLMs as well as existing routing methods on both in-distribution (+2.76\%) and out-of-distribution (+1.90\%) tasks. Source code is available at https://github.com/shuhao02/RouterDC.
翻译:近期研究表明,组装多个现成的大语言模型(LLMs)能够整合其互补能力。为实现这一目标,路由是一种有前景的方法,其通过学习一个路由器来为每个查询选择最合适的大语言模型。然而,当多个大语言模型对同一查询均表现良好时,现有的路由模型效果不佳。为解决此问题,本文提出了一种名为基于查询的双重对比学习路由器(RouterDC)的方法。RouterDC模型由一个编码器和一组大语言模型嵌入向量构成,我们提出了两种对比学习损失函数来训练RouterDC模型。实验结果表明,RouterDC在组装大语言模型方面是有效的,并且在分布内(+2.76%)和分布外(+1.90%)任务上均显著优于单个性能最优的大语言模型以及现有的路由方法。源代码可在 https://github.com/shuhao02/RouterDC 获取。