This paper tackles the critical challenge of object navigation in autonomous navigation systems, particularly focusing on the problem of target approach and episode termination in environments with long optimal episode length in Deep Reinforcement Learning (DRL) based methods. While effective in environment exploration and object localization, conventional DRL methods often struggle with optimal path planning and termination recognition due to a lack of depth information. To overcome these limitations, we propose a novel approach, namely the Depth-Inference Termination Agent (DITA), which incorporates a supervised model called the Judge Model to implicitly infer object-wise depth and decide termination jointly with reinforcement learning. We train our judge model along with reinforcement learning in parallel and supervise the former efficiently by reward signal. Our evaluation shows the method is demonstrating superior performance, we achieve a 9.3% gain on success rate than our baseline method across all room types and gain 51.2% improvements on long episodes environment while maintaining slightly better Success Weighted by Path Length (SPL). Code and resources, visualization are available at: https://github.com/HuskyKingdom/DITA_acml2023
翻译:本文研究了自主导航系统中物体导航的关键挑战,特别关注基于深度强化学习(DRL)的方法在最优回合长度较长的环境中的目标接近和回合终止问题。虽然传统的DRL方法在环境探索和物体定位方面表现有效,但由于缺乏深度信息,它们常常在最优路径规划和终止识别方面遇到困难。为克服这些限制,我们提出了一种新颖的方法,即深度推断终止智能体(DITA),该方法引入了一个称为判断模型的监督模型,以隐式推断物体级深度,并与强化学习共同决策终止。我们并行训练判断模型与强化学习,并通过奖励信号高效监督前者。评估表明,我们的方法展现出卓越性能:在所有房间类型中,成功率比基线方法提升了9.3%,在长回合环境中提升了51.2%,同时保持了略优的路径长度加权成功率(SPL)。代码、资源和可视化内容可在https://github.com/HuskyKingdom/DITA_acml2023获取。