Efficient navigation in dynamic environments requires anticipating how motion patterns evolve beyond the robot's immediate perceptual range, enabling preemptive rather than purely reactive planning in crowded scenes. Maps of Dynamics (MoDs) offer a structured representation of motion tendencies in space useful for long-term global planning, but constructing them traditionally requires global environment observations over extended periods of time. We introduce EgoMoD, the first approach that learns to predict future MoDs directly from short egocentric video clips collected during robot operation. Our method learns to infer environment-wide motion tendencies from local dynamic cues using a video- and pose-conditioned architecture trained with MoDs computed from external observations as privileged supervision, allowing local observations to serve as predictive signals of global motion structure. Thanks to this, we offer the capacity to forecast future motion dynamics over the whole environment rather than merely extend past patterns in the robot's field of view. As a site-specific dynamic prior, EgoMoD replaces the external global sensing infrastructure required by prior MoD methods at inference time with standard onboard sensors. Experiments in large simulated environments show that EgoMoD predicts future MoDs under limited observability, while evaluation with real images showcases its zero-shot transferability to real systems.
翻译:在动态环境中高效导航需要预判超越机器人即时感知范围的运动模式演化,从而在拥挤场景中实现先发制人而非纯粹反应式规划。动态地图(MoDs)提供了空间中运动倾向的结构化表示,有助于长期全局规划,但传统构建方法需要长时段全局环境观测。我们提出EgoMoD——首个学习从机器人运行期间采集的短时自我中心视频片段直接预测未来MoDs的方法。该方法利用视频与姿态条件架构,通过外部观测计算所得的MoDs作为特权监督进行训练,从而从局部动态线索推断环境全局运动倾向,使局部观测成为全局运动结构的预测信号。由此,我们具备预测全环境未来运动动态的能力,而非仅扩展机器人视野中的历史模式。作为场景特定的动态先验,EgoMoD在推理时以标准车载传感器替代先前MoD方法所需的外部全局感知基础设施。大规模仿真环境实验表明,EgoMoD能在有限可观测性下预测未来MoDs,而真实图像评估则展示了其向真实系统的零样本迁移能力。