With the increase in demands for service robots and automated inspection, agents need to localize in its surrounding environment to achieve more natural communication with humans by shared contexts. In this work, we propose a novel but straightforward task of precise target view localization for look around agents called the FindView task. This task imitates the movements of PTZ cameras or user interfaces for 360 degree mediums, where the observer must "look around" to find a view that exactly matches the target. To solve this task, we introduce a rule-based agent that heuristically finds the optimal view and a policy learning agent that employs reinforcement learning to learn by interacting with the 360 degree scene. Through extensive evaluations and benchmarks, we conclude that learned methods have many advantages, in particular precise localization that is robust to corruption and can be easily deployed in novel scenes.
翻译:随着服务机器人与自动化巡检需求的增长,智能体需通过共享上下文感知周边环境,以实现更自然的人机交互。本文提出一项新颖且直接的任务——FindView任务,即面向环视智能体的精确目标视角定位。该任务模拟PTZ摄像头或360度介质用户界面的运动模式,要求观察者通过“环视”找到与目标完全匹配的视角。为解决此任务,我们提出基于规则的智能体(通过启发式方法寻找最优视角)与策略学习智能体(采用强化学习通过与360度场景交互进行学习)两种方法。通过广泛的评估与基准测试,我们得出结论:学习方法具有显著优势,特别是能实现鲁棒于干扰的精确定位,且易于部署至新场景中。