Many leading language models (LMs) use high-intensity computational resources both during training and execution. This poses the challenge of lowering resource costs for deployment and faster execution of decision-making tasks among others. We introduce a novel plug-and-play LM framework named Language Optimising Network Distribution (LONDI) framework. LONDI learns to selectively employ large LMs only where complex decision-making and reasoning are required while using low-resource LMs everywhere else. LONDI consists of a system of two (off-)policy networks, an LM, a large LM (LLM), and a reinforcement learning module that uses switching controls to quickly learn which system states to call the LLM. We then introduce a variant of LONDI that maintains budget constraints on LLM calls and hence its resource usage. Theoretically, we prove LONDI learns the subset of system states to activate the LLM required to solve the task. We then prove that LONDI converges to optimal solutions while also preserving budgetary constraints on LLM calls almost surely enabling it to solve various tasks while significantly lowering computational costs. We test LONDI's performance in a range of tasks in ScienceWorld and BabyAI-Text and demonstrate that LONDI can solve tasks only solvable by resource-intensive LLMs while reducing GPU usage by up to 30%.
翻译:许多领先的语言模型(LM)在训练和执行过程中都使用高强度的计算资源。这带来了降低部署资源成本以及加速决策任务执行等挑战。我们引入了一种新颖的即插即用式LM框架,称为语言优化网络分布(LONDI)框架。LONDI学习仅在需要复杂决策和推理时选择性地使用大型LM,而在其他所有地方则使用低资源LM。LONDI由两个(离)策略网络、一个LM、一个大型LM(LLM)以及一个强化学习模块组成,该模块利用切换控制快速学习在哪些系统状态下调用LLM。随后,我们引入了一种LONDI的变体,该变体对LLM调用及其资源使用施加预算约束。理论上,我们证明LONDI能够学习为解决任务所需激活LLM的系统状态子集。接着,我们证明LONDI几乎必然收敛到最优解,同时保持对LLM调用的预算约束,从而使其能够在显著降低计算成本的同时解决各种任务。我们在ScienceWorld和BabyAI-Text中的一系列任务上测试了LONDI的性能,结果表明,LONDI能够解决仅需资源密集型LLM才能完成的任务,同时将GPU使用量降低高达30%。