Online Foundation Model Selection in Robotics

Foundation models have recently expanded into robotics after excelling in computer vision and natural language processing. The models are accessible in two ways: open-source or paid, closed-source options. Users with access to both face a problem when deciding between effective yet costly closed-source models and free but less powerful open-source alternatives. We call it the model selection problem. Existing supervised-learning methods are impractical due to the high cost of collecting extensive training data from closed-source models. Hence, we focus on the online learning setting where algorithms learn while collecting data, eliminating the need for large pre-collected datasets. We thus formulate a user-centric online model selection problem and propose a novel solution that combines an open-source encoder to output context and an online learning algorithm that processes this context. The encoder distills vast data distributions into low-dimensional features, i.e., the context, without additional training. The online learning algorithm aims to maximize a composite reward that includes model performance, execution time, and costs based on the context extracted from the data. It results in an improved trade-off between selecting open-source and closed-source models compared to non-contextual methods, as validated by our theoretical analysis. Experiments across language-based robotic tasks such as Waymo Open Dataset, ALFRED, and Open X-Embodiment demonstrate real-world applications of the solution. The results show that the solution significantly improves the task success rate by up to 14%.

翻译：基础模型在计算机视觉和自然语言处理中表现优异后，近期已扩展至机器人领域。这些模型可通过两种方式获取：开源模型或付费闭源模型。能够同时使用两种模型的用户在决策时面临困境：是选择高效但成本高昂的闭源模型，还是选择免费但性能较弱的开源替代方案。我们将此问题称为模型选择问题。由于从闭源模型收集大规模训练数据的成本过高，现有的监督学习方法并不实用。因此，我们聚焦于在线学习场景——算法在数据收集过程中同步学习，从而无需预先准备大规模数据集。据此，我们提出一种以用户为中心的在线模型选择问题，并设计了一种创新解决方案：该方案结合了用于输出上下文的开源编码器，以及处理该上下文的在线学习算法。编码器无需额外训练即可将海量数据分布提炼为低维特征（即上下文）。在线学习算法则基于从数据中提取的上下文，以最大化包含模型性能、执行时间和成本在内的复合奖励。理论分析证实，相较于无上下文方法，该方案在开源模型与闭源模型的选择之间实现了更优的权衡。基于Waymo Open Dataset、ALFRED和Open X-Embodiment等语言驱动型机器人任务的跨场景实验验证了该方案的实际应用价值。结果表明，该方案可将任务成功率最高提升14%。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日