Navigating toward specific objects in unknown environments without additional training, known as Zero-Shot object navigation, poses a significant challenge in the field of robotics, which demands high levels of auxiliary information and strategic planning. Traditional works have focused on holistic solutions, overlooking the specific challenges agents encounter during navigation such as collision, low exploration efficiency, and misidentification of targets. To address these challenges, our work proposes TriHelper, a novel framework designed to assist agents dynamically through three primary navigation challenges: collision, exploration, and detection. Specifically, our framework consists of three innovative components: (i) Collision Helper, (ii) Exploration Helper, and (iii) Detection Helper. These components work collaboratively to solve these challenges throughout the navigation process. Experiments on the Habitat-Matterport 3D (HM3D) and Gibson datasets demonstrate that TriHelper significantly outperforms all existing baseline methods in Zero-Shot object navigation, showcasing superior success rates and exploration efficiency. Our ablation studies further underscore the effectiveness of each helper in addressing their respective challenges, notably enhancing the agent's navigation capabilities. By proposing TriHelper, we offer a fresh perspective on advancing the object navigation task, paving the way for future research in the domain of Embodied AI and visual-based navigation.
翻译:在未知环境中无需额外训练即朝向特定目标导航(即零样本目标导航)是机器人领域的一项重大挑战,这需要高水平的辅助信息和策略规划。传统研究多聚焦于全局解决方案,忽视了智能体在导航过程中面临的具体困难,如碰撞、探索效率低下及目标误判。针对这些问题,本文提出TriHelper——一种新颖的框架,旨在通过三类核心导航挑战(碰撞、探索与检测)动态辅助智能体。具体而言,该框架包含三个创新模块:(i)碰撞辅助器、(ii)探索辅助器及(iii)检测辅助器。这些模块在导航过程中协同工作以解决上述挑战。在Habitat-Matterport 3D(HM3D)与Gibson数据集上的实验表明,TriHelper在零样本目标导航任务中显著超越所有现有基线方法,展现出卓越的成功率与探索效率。消融研究进一步验证了各辅助器在应对相应挑战时的有效性,显著提升了智能体的导航能力。通过提出TriHelper,我们为目标导航任务提供了全新视角,为具身人工智能与视觉导航领域的未来研究铺平了道路。