Octopus v4: Graph of language models

Language models have been effective in a wide range of applications, yet the most sophisticated models are often proprietary. For example, GPT-4 by OpenAI and various models by Anthropic are expensive and consume substantial energy. In contrast, the open-source community has produced competitive models, like Llama3. Furthermore, niche-specific smaller language models, such as those tailored for legal, medical or financial tasks, have outperformed their proprietary counterparts. This paper introduces a novel approach that employs \textit{functional tokens} to integrate \textbf{multiple open-source models}, each optimized for particular tasks. Our newly developed Octopus v4 model leverages \textit{functional tokens} to intelligently direct user queries to the most appropriate vertical model and reformat the query to achieve the best performance. Octopus v4, an evolution of the Octopus v1, v2, and v3 models, excels in selection and parameter understanding and reformatting. Additionally, we explore the use of graph as a versatile data structure that effectively coordinates multiple open-source models by harnessing the capabilities of the Octopus model and \textit{functional tokens}. Use our open-sourced GitHub (\url{https://www.nexa4ai.com/}) to try Octopus v4 models (\url{https://huggingface.co/NexaAIDev/Octopus-v4}), and contrite to a larger graph of language models. By activating models less than 10B parameters, we achieved SOTA MMLU score of 74.8 among the same level models.

翻译：语言模型在众多应用中已展现出卓越效果，然而最先进的模型往往具有专有性。例如OpenAI的GPT-4及Anthropic的多款模型不仅使用成本高昂，且消耗大量能源。相比之下，开源社区已开发出具有竞争力的模型（如Llama3）。此外，针对法律、医疗或金融等特定领域的专业化小型语言模型，其性能已超越专有模型。本文提出一种创新方法，采用**功能标记（functional tokens）** 整合**多个开源模型**，每个模型针对特定任务进行优化。新开发的Octopus v4模型利用功能标记智能地将用户查询导向最合适的垂直模型，并重新格式化查询以获取最优性能。作为Octopus v1、v2及v3模型的演进版本，Octopus v4在模型选择、参数理解与查询重格式化方面表现卓越。同时，我们探索将图（graph）作为通用数据结构，通过结合Octopus模型与功能标记的能力高效协调多个开源模型。可通过我们开源的GitHub项目（\url{https://www.nexa4ai.com/}）体验Octopus v4模型（\url{https://huggingface.co/NexaAIDev/Octopus-v4}），并共同构建更庞大的语言模型图。通过激活参数不足100亿的模型，我们在同级别模型中实现了74.8的SOTA MMLU分数。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日