This paper proposes a method for learning continuous control policies for active landmark localization and exploration using an information-theoretic cost. We consider a mobile robot detecting landmarks within a limited sensing range, and tackle the problem of learning a control policy that maximizes the mutual information between the landmark states and the sensor observations. We employ a Kalman filter to convert the partially observable problem in the landmark state to Markov decision process (MDP), a differentiable field of view to shape the reward, and an attention-based neural network to represent the control policy. The approach is further unified with active volumetric mapping to promote exploration in addition to landmark localization. The performance is demonstrated in several simulated landmark localization tasks in comparison with benchmark methods.
翻译:本文提出了一种利用信息论代价函数进行主动地标定位与探索的连续控制策略学习方法。我们考虑一台在有限感知范围内检测地标的移动机器人,旨在学习一个控制策略,最大化地标状态与传感器观测之间的互信息。采用卡尔曼滤波器将地标状态的部分可观测问题转化为马尔可夫决策过程,利用可微视场塑造奖励函数,并借助基于注意力机制的神经网络表示控制策略。该方法进一步与主动体积映射相统一,在定位地标的同时促进探索。通过多个模拟地标定位任务与基准方法的对比实验,验证了所提方法的性能。