In this paper, we focus on the problem of efficiently locating a target object described with free-form language using a mobile robot equipped with vision sensors (e.g., an RGBD camera). Conventional active visual search predefines a set of objects to search for, rendering these techniques restrictive in practice. To provide added flexibility in active visual searching, we propose a system where a user can enter target commands using free-form language; we call this system Active Visual Search in the Wild (AVSW). AVSW detects and plans to search for a target object inputted by a user through a semantic grid map represented by static landmarks (e.g., desk or bed). For efficient planning of object search patterns, AVSW considers commonsense knowledge-based co-occurrence and predictive uncertainty while deciding which landmarks to visit first. We validate the proposed method with respect to SR (success rate) and SPL (success weighted by path length) in both simulated and real-world environments. The proposed method outperforms previous methods in terms of SPL in simulated scenarios with an average gap of 0.283. We further demonstrate AVSW with a Pioneer-3AT robot in real-world studies.
翻译:本文聚焦于利用配备视觉传感器(如RGBD相机)的移动机器人,高效定位以自由文本描述的目标物体问题。传统主动视觉搜索需预先定义待搜索物体集合,导致这些技术在实际应用中受限。为增强主动视觉搜索的灵活性,我们提出一种允许用户通过自由文本输入目标指令的系统,并将其命名为"野外主动视觉搜索(AVSW)"。AVSW通过由静态地标(如书桌或床)表示的语义网格地图,检测并规划用户输入的目标物体的搜索路径。为高效规划物体搜索模式,AVSW在决定优先访问哪些地标时,综合考虑基于常识知识的共现关系与预测不确定性。我们在仿真与真实环境中,以成功率和路径长度加权成功率作为指标验证该方法。在仿真场景中,本方法在SPL指标上平均领先现有方法0.283。我们进一步在真实世界研究中利用Pioneer-3AT机器人演示了AVSW系统。