We propose a new method for improving zero-shot ObjectNav that aims to utilize potentially available environmental percepts for navigational assistance. Our approach takes into account that the ground agent may have limited and sometimes obstructed view. Our formulation encourages Generative Communication (GC) between an assistive overhead agent with a global view containing the target object and the ground agent with an obfuscated view; both equipped with Vision-Language Models (VLMs) for vision-to-language translation. In this assisted setup, the embodied agents communicate environmental information before the ground agent executes actions towards a target. Despite the overhead agent having a global view with the target, we note a drop in performance (-13% in OSR and -13% in SPL) of a fully cooperative assistance scheme over an unassisted baseline. In contrast, a selective assistance scheme where the ground agent retains its independent exploratory behaviour shows a 10% OSR and 7.65% SPL improvement. To explain navigation performance, we analyze the GC for unique traits, quantifying the presence of hallucination and cooperation. Specifically, we identify the novel linguistic trait of preemptive hallucination in our embodied setting, where the overhead agent assumes that the ground agent has executed an action in the dialogue when it is yet to move, and note its strong correlation with navigation performance. We conduct real-world experiments and present some qualitative examples where we mitigate hallucinations via prompt finetuning to improve ObjectNav performance.
翻译:我们提出了一种改进零样本目标导航的新方法,旨在利用环境中潜在可用的感知信息进行导航辅助。我们的方法考虑了地面智能体可能具有受限且有时被遮挡的视野。该框架鼓励拥有全局视野(包含目标物体)的辅助空中智能体与视野受限的地面智能体之间进行生成式通信;两者均配备用于视觉到语言转换的视觉语言模型。在此辅助设置中,具身智能体在执行朝向目标的动作前先交换环境信息。尽管空中智能体拥有包含目标的全局视野,我们观察到完全协作辅助方案相比无辅助基线的性能下降(OSR降低13%,SPL降低13%)。相比之下,采用选择性辅助方案(地面智能体保持独立探索行为)则实现了10%的OSR和7.65%的SPL提升。为解释导航性能,我们分析了生成式通信的独特特征,量化了幻觉与协作现象。特别地,我们在具身环境中发现了预判性幻觉这一新颖语言特征——即空中智能体在对话中预设地面智能体已执行某个动作而实际尚未移动,并注意到该特征与导航性能的强相关性。我们进行了真实世界实验,并通过提示微调缓解幻觉的定性案例,展示了该方法对目标导航性能的改进效果。