Embodied agents are expected to perform object navigation in dynamic, open-world environments. However, existing approaches typically rely on static trajectories and a fixed set of object categories during training, overlooking the real-world requirement for continual adaptation to evolving scenarios. To facilitate related studies, we introduce the continual object navigation benchmark, which requires agents to acquire navigation skills for new object categories while avoiding catastrophic forgetting of previously learned knowledge. To tackle this challenge, we propose C-Nav, a continual visual navigation framework that integrates two key innovations: (1) A dual-path anti-forgetting mechanism, which comprises feature distillation that aligns multi-modal inputs into a consistent representation space to ensure representation consistency, and feature replay that retains temporal features within the action decoder to ensure policy consistency. (2) An adaptive sampling strategy that selects diverse and informative experiences, thereby reducing redundancy and minimizing memory overhead. Extensive experiments across multiple model architectures demonstrate that C-Nav consistently outperforms existing approaches, achieving superior performance even compared to baselines with full trajectory retention, while significantly lowering memory requirements. The code will be publicly available at https://bigtree765.github.io/C-Nav-project.
翻译:具身智能体被期望在动态的开放世界环境中执行物体导航任务。然而,现有方法通常依赖训练过程中的静态轨迹和固定的物体类别集合,忽视了在现实世界中对不断演变场景进行持续适应的需求。为促进相关研究,我们引入了持续物体导航基准,该基准要求智能体在学习新物体类别导航技能的同时,避免对先前所学知识的灾难性遗忘。为应对这一挑战,我们提出了C-Nav,一个持续视觉导航框架,它整合了两项关键创新:(1)一种双路径抗遗忘机制,该机制包含特征蒸馏和特征回放。特征蒸馏将多模态输入对齐到一致的表示空间以确保表示一致性;特征回放在动作解码器中保留时序特征以确保策略一致性。(2)一种自适应采样策略,该策略选择多样且信息丰富的经验,从而减少冗余并最小化内存开销。跨多种模型架构的大量实验表明,C-Nav持续优于现有方法,即使与保留完整轨迹的基线相比也能实现卓越性能,同时显著降低了内存需求。代码将在 https://bigtree765.github.io/C-Nav-project 公开提供。