ECVL-ROUTER：面向视觉语言模型的场景感知路由框架 (ECVL-ROUTER: Scenario-Aware Routing for Vision-Language Models)

Vision-Language Models (VLMs) excel in diverse multimodal tasks. However, user requirements vary across scenarios, which can be categorized into fast response, high-quality output, and low energy consumption. Relying solely on large models deployed in the cloud for all queries often leads to high latency and energy cost, while small models deployed on edge devices are capable of handling simpler tasks with low latency and energy cost. To fully leverage the strengths of both large and small models, we propose ECVL-ROUTER, the first scenario-aware routing framework for VLMs. Our approach introduces a new routing strategy and evaluation metrics that dynamically select the appropriate model for each query based on user requirements, maximizing overall utility. We also construct a multimodal response-quality dataset tailored for router training and validate the approach through extensive experiments. Results show that our approach successfully routes over 80\% of queries to the small model while incurring less than 10\% drop in problem solving probability.

翻译：视觉语言模型（VLMs）在多种多模态任务中表现出色。然而，用户需求因场景而异，可分为快速响应、高质量输出和低能耗三类。完全依赖云端部署的大型模型处理所有查询通常会导致高延迟和高能耗，而部署在边缘设备上的小型模型能够以低延迟和低能耗处理较简单的任务。为充分发挥大小模型的优势，我们提出了ECVL-ROUTER，这是首个面向VLMs的场景感知路由框架。该方法引入了一种新的路由策略和评估指标，可根据用户需求为每个查询动态选择合适模型，从而最大化整体效用。我们还构建了一个专为路由器训练定制的多模态响应质量数据集，并通过大量实验验证了该方法的有效性。实验结果表明，我们的方法成功将超过80%的查询路由至小型模型，同时问题解决概率的下降幅度低于10%。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日