All Language Models Large and Small

Many leading language models (LMs) use high-intensity computational resources both during training and execution. This poses the challenge of lowering resource costs for deployment and faster execution of decision-making tasks among others. We introduce a novel plug-and-play LM framework named Language Optimising Network Distribution (LONDI) framework. LONDI learns to selectively employ large LMs only where complex decision-making and reasoning are required while using low-resource LMs everywhere else. LONDI consists of a system of two (off-)policy networks, an LM, a large LM (LLM), and a reinforcement learning module that uses switching controls to quickly learn which system states to call the LLM. We then introduce a variant of LONDI that maintains budget constraints on LLM calls and hence its resource usage. Theoretically, we prove LONDI learns the subset of system states to activate the LLM required to solve the task. We then prove that LONDI converges to optimal solutions while also preserving budgetary constraints on LLM calls almost surely enabling it to solve various tasks while significantly lowering computational costs. We test LONDI's performance in a range of tasks in ScienceWorld and BabyAI-Text and demonstrate that LONDI can solve tasks only solvable by resource-intensive LLMs while reducing GPU usage by up to 30%.

翻译：许多领先的语言模型（LM）在训练和执行过程中都使用高强度的计算资源。这带来了降低部署资源成本以及加速决策任务执行等挑战。我们引入了一种新颖的即插即用式LM框架，称为语言优化网络分布（LONDI）框架。LONDI学习仅在需要复杂决策和推理时选择性地使用大型LM，而在其他所有地方则使用低资源LM。LONDI由两个（离）策略网络、一个LM、一个大型LM（LLM）以及一个强化学习模块组成，该模块利用切换控制快速学习在哪些系统状态下调用LLM。随后，我们引入了一种LONDI的变体，该变体对LLM调用及其资源使用施加预算约束。理论上，我们证明LONDI能够学习为解决任务所需激活LLM的系统状态子集。接着，我们证明LONDI几乎必然收敛到最优解，同时保持对LLM调用的预算约束，从而使其能够在显著降低计算成本的同时解决各种任务。我们在ScienceWorld和BabyAI-Text中的一系列任务上测试了LONDI的性能，结果表明，LONDI能够解决仅需资源密集型LLM才能完成的任务，同时将GPU使用量降低高达30%。

相关内容

大语言模型

关注 66

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日