As the AI community increasingly adopts large-scale models, it is crucial to develop general and flexible tools to integrate them. We introduce Gather-Attend-Scatter (GATS), a novel module that enables seamless combination of pretrained foundation models, both trainable and frozen, into larger multimodal networks. GATS empowers AI systems to process and generate information across multiple modalities at different rates. In contrast to traditional fine-tuning, GATS allows for the original component models to remain frozen, avoiding the risk of them losing important knowledge acquired during the pretraining phase. We demonstrate the utility and versatility of GATS with a few experiments across games, robotics, and multimodal input-output systems.
翻译:摘要:随着人工智能社区日益广泛采用大规模模型,开发通用且灵活的工具来集成这些模型变得至关重要。我们提出了一种名为“聚集-关注-分散”(Gather-Attend-Scatter,简称GATS)的新型模块,该模块能够将预训练的基础模型(包括可训练模型和冻结模型)无缝组合成更大的多模态网络。GATS使人工智能系统能够以不同速率处理并生成跨多种模态的信息。与传统微调方法不同,GATS允许原始组成模型保持冻结状态,从而避免它们在预训练阶段获得的重要知识发生丢失的风险。我们通过多项涵盖游戏、机器人和多模态输入输出系统的实验,展示了GATS的实用性和多功能性。