To teach robots complex manipulation tasks, it is now a common practice to fine-tune a pre-trained vision-language-action model (VLA) on task-specific data. However, since this recipe updates existing representations, it is unsuitable for long-term operation in the real world, where robots must continually adapt to new tasks and environments while retaining the knowledge they have already acquired. Existing continual learning methods for robotics commonly require storing previous data (exemplars), struggle with long task sequences, or rely on task identifiers for deployment. To address these limitations, we propose CLARE, a general, parameter-efficient framework for exemplar-free continual learning with VLAs. CLARE introduces lightweight modular adapters into selected feedforward layers and autonomously expands the model only where necessary when learning a new task, guided by layer-wise feature similarity. During deployment, an autoencoder-based routing mechanism dynamically activates the most relevant adapters without requiring task labels. Through extensive experiments on the LIBERO benchmark, we show that CLARE achieves high performance on new tasks without catastrophic forgetting of earlier tasks, significantly outperforming even exemplar-based methods. Code and data are available at https://tum-lsy.github.io/clare.
翻译:为教导机器人完成复杂的操作任务,当前普遍做法是在任务特定数据上对预训练的视觉-语言-动作模型进行微调。然而,由于该方法会更新现有表征,不适用于现实世界中的长期运行场景——机器人必须在持续适应新任务与环境的同时,保留已习得的知识。现有的机器人持续学习方法通常需要存储历史数据(样本),难以处理长任务序列,或依赖任务标识进行部署。为突破这些局限,我们提出CLARE:一种通用、参数高效的免样本持续学习框架,适用于视觉-语言-动作模型。CLARE将轻量化模块适配器嵌入选定的前馈层,并基于层间特征相似性指导,仅在学习新任务时对必要模块进行自主扩展。在部署阶段,基于自编码器的路由机制无需任务标签即可动态激活最相关的适配器。通过在LIBERO基准上的大量实验,我们证明CLARE能在新任务上实现高性能表现,且不会对早期任务产生灾难性遗忘,其性能显著优于基于样本的方法。代码与数据发布于 https://tum-lsy.github.io/clare。