Effective embodied exploration requires agents to accumulate and retain spatial knowledge over time. However, existing scene representations, such as discrete scene graphs or static view-based snapshots, lack \textit{post-hoc re-observability}. If an initial observation misses a target, the resulting memory omission is often irrecoverable. To bridge this gap, we propose \textbf{GSMem}, a zero-shot embodied exploration and reasoning framework built upon 3D Gaussian Splatting (3DGS). By explicitly parameterizing continuous geometry and dense appearance, 3DGS serves as a persistent spatial memory that endows the agent with \textit{Spatial Recollection}: the ability to render photorealistic novel views from optimal, previously unoccupied viewpoints. To operationalize this, GSMem employs a retrieval mechanism that simultaneously leverages parallel object-level scene graphs and semantic-level language fields. This complementary design robustly localizes target regions, enabling the agent to ``hallucinate'' optimal views for high-fidelity Vision-Language Model (VLM) reasoning. Furthermore, we introduce a hybrid exploration strategy that combines VLM-driven semantic scoring with a 3DGS-based coverage objective, balancing task-aware exploration with geometric coverage. Extensive experiments on embodied question answering and lifelong navigation demonstrate the robustness and effectiveness of our framework
翻译:有效的具身探索要求智能体随时间累积并保留空间知识。然而,现有场景表示(如离散场景图或基于静态视图的快照)缺乏事后可重观测性。若初始观测遗漏目标,由此导致的记忆缺失往往无法恢复。为弥合这一鸿沟,我们提出GSMem——一种基于三维高斯泼溅(3DGS)构建的零样本具身探索与推理框架。通过显式参数化连续几何与密集外观,3DGS作为持久空间记忆赋予智能体空间回溯能力:即能够从最优的、先前未占据视点渲染出逼真的新视角视图。为实现该能力,GSMem采用一种检索机制,同时利用并行对象级场景图与语义级语言场。这种互补性设计可鲁棒定位目标区域,使智能体“幻觉”出最优视图以支持高保真视觉语言模型(VLM)推理。此外,我们引入一种混合探索策略,将VLM驱动的语义评分与基于3DGS的覆盖目标相结合,平衡任务感知探索与几何覆盖。在具身问答与终身导航任务上的大量实验证明了我们框架的鲁棒性与有效性。