In learning an embodied agent executing daily tasks via language directives, the literature largely assumes that the agent learns all training data at the beginning. We argue that such a learning scenario is less realistic since a robotic agent is supposed to learn the world continuously as it explores and perceives it. To take a step towards a more realistic embodied agent learning scenario, we propose two continual learning setups for embodied agents; learning new behaviors (Behavior Incremental Learning, Behavior-IL) and new environments (Environment Incremental Learning, Environment-IL) For the tasks, previous 'data prior' based continual learning methods maintain logits for the past tasks. However, the stored information is often insufficiently learned information and requires task boundary information, which might not always be available. Here, we propose to update them based on confidence scores without task boundary information during training (i.e., task-free) in a moving average fashion, named Confidence-Aware Moving Average (CAMA). In the proposed Behavior-IL and Environment-IL setups, our simple CAMA outperforms prior state of the art in our empirical validations by noticeable margins. The project page including codes is https://github.com/snumprlab/cl-alfred.
翻译:在通过语言指令学习执行日常任务的具身智能体研究中,现有文献大多假设智能体在初始阶段即可学习全部训练数据。我们认为这种学习场景缺乏现实性,因为机器人智能体应当在其探索与感知世界的过程中持续学习。为向更真实的具身智能体学习场景迈进,我们针对具身智能体提出了两种持续学习设置:新行为学习(行为增量学习,Behavior-IL)与新环境学习(环境增量学习,Environment-IL)。针对这些任务,以往基于"数据先验"的持续学习方法通过存储历史任务的logits来维持旧知识,然而存储的信息往往存在学习不充分的问题,且需要任务边界信息(该信息在实际应用中未必可得)。为此,我们提出无需任务边界信息的训练方法(即无任务设置),基于置信度分数以移动平均方式更新logits,命名为置信度感知移动平均(CAMA)。在所提出的Behavior-IL与Environment-IL设置中,我们的简易CAMA方法在实证验证中显著优于现有最优方法。项目页面及代码参见https://github.com/snumprlab/cl-alfred。