Social navigation requires robots to act safely in dynamic human environments. Effective behavior demands thinking ahead: reasoning about how the scene and pedestrians evolve under different robot actions rather than reacting to current observations alone. This creates a coupled prediction-planning challenge, where robot actions and human motion mutually influence each other. To address this challenge, we propose NavThinker, a future-aware framework that couples an action-conditioned world model with on-policy reinforcement learning. The world model operates in the Depth Anything V2 patch feature space and performs autoregressive prediction of future scene geometry and human motion; multi-head decoders then produce future depth maps and human trajectories, yielding a future-aware state aligned with traversability and interaction risk. Crucially, we train the policy with DD-PPO while injecting world-model think-ahead signals via: (i) action-conditioned future features fused into the current observation embedding and (ii) social reward shaping from predicted human trajectories. Experiments on single- and multi-robot Social-HM3D show state-of-the-art navigation success, with zero-shot transfer to Social-MP3D and real-world deployment on a Unitree Go2, validating generalization and practical applicability. Webpage: https://hutslib.github.io/NavThinker.
翻译:社会性导航要求机器人在动态人类环境中安全行动。有效行为需要前瞻性思考:基于不同机器人动作推演场景和行人的演化,而非仅对当前观测做出反应。这构成了耦合预测-规划挑战,其中机器人动作与人类运动相互影响。为应对此挑战,我们提出NavThinker——一种将动作条件世界模型与在线策略强化学习相结合的未来感知框架。该世界模型在Depth Anything V2补丁特征空间中运行,对场景几何与人类运动进行自回归预测;多头部解码器随后生成未来深度图与人类轨迹,形成与可穿越性和交互风险对齐的未来感知状态。关键在于,我们通过以下方式结合世界模型的前瞻信号来训练DD-PPO策略:(i)将动作条件的未来特征融入当前观测嵌入表示,(ii)根据预测的人类轨迹进行社会奖励塑造。在单机器人和多机器人Social-HM3D上的实验展现出最先进的导航成功率,零样本迁移至Social-MP3D以及在Unitree Go2上的真实部署验证了泛化能力与实用价值。网页:https://hutslib.github.io/NavThinker。