Adaptive navigation in unfamiliar indoor environments is crucial for household service robots. Despite advances in zero-shot perception and reasoning from vision-language models, existing navigation systems still rely on single-pass scoring at the decision layer, leading to overconfident long-horizon errors and redundant exploration. To tackle these problems, we propose Dual-Stance Cooperative Debate Navigation (DSCD-Nav), a decision mechanism that replaces one-shot scoring with stance-based cross-checking and evidence-aware arbitration to improve action reliability under partial observability. Specifically, given the same observation and candidate action set, we explicitly construct two stances by conditioning the evaluation on diverse and complementary objectives: a Task-Scene Understanding (TSU) stance that prioritizes goal progress from scene-layout cues, and a Safety-Information Balancing (SIB) stance that emphasizes risk and information value. The stances conduct a cooperative debate and make policy by cross-checking their top candidates with cue-grounded arguments. Then, a Navigation Consensus Arbitration (NCA) agent is employed to consolidate both sides' reasons and evidence, optionally triggering lightweight micro-probing to verify uncertain choices, preserving NCA's primary intent while disambiguating. Experiments on HM3Dv1, HM3Dv2, and MP3D demonstrate consistent improvements in success and path efficiency while reducing exploration redundancy.
翻译:在陌生室内环境中的自适应导航对于家庭服务机器人至关重要。尽管视觉-语言模型在零样本感知与推理方面取得了进展,但现有导航系统在决策层仍依赖于单次评分,这导致了过度自信的长时程错误和冗余探索。为解决这些问题,我们提出了双立场协同辩论导航(DSCD-Nav),这是一种决策机制,它通过基于立场的交叉检验和证据感知仲裁来替代一次性评分,以提高部分可观测性下的动作可靠性。具体而言,在给定相同观测和候选动作集的情况下,我们通过将评估条件设定于多样且互补的目标,显式地构建两种立场:一种是任务-场景理解(TSU)立场,其优先考虑从场景布局线索中获取的目标进展;另一种是安全-信息平衡(SIB)立场,其强调风险和信息价值。这两种立场进行协同辩论,并通过交叉检验各自的首选候选动作及其基于线索的论据来制定策略。随后,引入一个导航共识仲裁(NCA)智能体来整合双方的推理与证据,并可选择性地触发轻量级微探测以验证不确定的选择,从而在消除歧义的同时保留NCA的主要意图。在HM3Dv1、HM3Dv2和MP3D数据集上的实验表明,该方法在成功率和路径效率方面均取得了一致的提升,同时减少了探索冗余。