End-to-end navigation policies trained on large simulation corpora degrade sharply when transferred to out-of-distribution scenes, categories, or goal modalities. Modular pipelines such as Modular GOAT are bottlenecked by closed-set object detection recall, while 3D snapshot-memory systems (e.g. 3D-Mem) accumulate dense, view-dependent representations that are heavy to maintain. We present AnyGoal, a training-free multi-robot architecture that places a Vision-Language Model (VLM) at the core of frontier-based exploration and coordinates agents through a shared 2D Gaussian Bayesian Value Map (BVM). The BVM maintains a per-pixel (mu, sigma^2) posterior over goal relevance, updated via precision-weighted fusion of VLM scores through a depth-cone mask, and is never reset between subtasks, yielding lifelong evidence accumulation. Frontiers are ranked by a convex blend of a VLM-as-judge softmax and a Bayesian UCB term on the BVM. A greedy allocator with spatial-separation penalty and commitment hysteresis distributes frontiers across agents without a centralized controller. On the full GOAT-Bench val unseen split (360 episodes, 2,669 subtasks), our dual-agent system achieves 52.4% Subtask SR at 12.7% SPL--state of the art under the strict physical regime (discrete 0.25 m steps, no teleportation, 42 deg HFOV) and a +27.5 pp improvement over Modular GOAT (24.9%). Single-agent AnyGoal achieves 41.9% Subtask SR, showing gains arise from the decision architecture. A four-way perception ablation shows that open-vocabulary detectors shift the dominant failure mode from exploration to goal verification.
翻译:在大型仿真语料库上训练的端到端导航策略,在迁移至分布外场景、类别或目标模态时会显著退化。诸如Modular GOAT等模块化流水线受限于封闭集目标检测的召回率瓶颈,而3D快照记忆系统(如3D-Mem)会积累密集且视角相关的表征,导致维护负担沉重。本文提出AnyGoal——一种免训练的多机器人架构,该架构以视觉-语言模型(VLM)为核心驱动基于前沿的探索,并通过共享的二维高斯贝叶斯价值图(BVM)协调智能体。BVM维护逐像素(μ,σ²)的目标相关性后验分布,通过深度锥形掩码对VLM评分进行精度加权融合以更新该分布,且在各子任务间永不重置,从而实现终身证据积累。前沿排序采用VLM判决softmax与BVM贝叶斯UCB项的凸组合。配备空间分离惩罚与承诺滞后的贪心分配器,可在无中央控制器情况下将前沿分布至各智能体。在完整GOAT-Bench验证集未见分割(360个场景、2669个子任务)上,我们的双智能体系统以12.7%的SPL达到52.4%的子任务成功率——在严格物理约束规范(离散0.25米步长、无瞬移、42度水平视场角)下创下最新最优,较Modular GOAT(24.9%)提升27.5个百分点。单智能体AnyGoal达到41.9%的子任务成功率,表明性能提升源于决策架构。四项感知消融实验显示,开放词汇检测器可将主导失效模式从探索转移至目标验证。