Robots navigating dynamic, cluttered, and semantically complex environments must integrate perception, symbolic reasoning, and spatial planning to generalize across diverse layouts and object categories. Existing methods often rely on static priors or limited memory, constraining adaptability under partial observability and semantic ambiguity. We present GRIP, Grid-based Relay with Intermediate Planning, a unified, modular framework with three scalable variants: GRIP-L (Lightweight), optimized for symbolic navigation via semantic occupancy grids; GRIP-F (Full), supporting multi-hop anchor chaining and LLM-based introspection; and GRIP-R (Real-World), enabling physical robot deployment under perceptual uncertainty. GRIP integrates dynamic 2D grid construction, open-vocabulary object grounding, co-occurrence-aware symbolic planning, and hybrid policy execution using behavioral cloning, D* search, and grid-conditioned control. Empirical results on AI2-THOR and RoboTHOR benchmarks show that GRIP achieves up to 9.6% higher success rates and over $2\times$ improvement in path efficiency (SPL and SAE) on long-horizon tasks. Qualitative analyses reveal interpretable symbolic plans in ambiguous scenes. Real-world deployment on a Jetbot further validates GRIP's generalization under sensor noise and environmental variation. These results position GRIP as a robust, scalable, and explainable framework bridging simulation and real-world navigation.
翻译:机器人在动态、杂乱且语义复杂的环境中导航时,必须整合感知、符号推理与空间规划能力,以泛化至多样化的布局与物体类别。现有方法通常依赖静态先验或有限记忆,在部分可观测性与语义模糊性条件下限制了适应性。本文提出GRIP(基于网格的递进式中间规划),一个统一的模块化框架,包含三种可扩展变体:GRIP-L(轻量版)通过语义占据网格优化符号导航;GRIP-F(完整版)支持多跳锚点链式推理与基于大语言模型的反思;GRIP-R(现实版)实现在感知不确定性下的物理机器人部署。GRIP整合了动态二维网格构建、开放词汇物体定位、共现感知符号规划,以及结合行为克隆、D*搜索与网格条件控制的混合策略执行。在AI2-THOR与RoboTHOR基准测试中的实验结果表明,GRIP在长视野任务中实现了最高9.6%的成功率提升,路径效率指标(SPL与SAE)提升超过2倍。定性分析揭示了其在模糊场景中可解释的符号规划能力。在Jetbot机器人上的实际部署进一步验证了GRIP在传感器噪声与环境变化下的泛化性能。这些成果确立了GRIP作为连接仿真与现实世界导航的鲁棒、可扩展且可解释的框架地位。