This paper proposes a novel model-based policy gradient algorithm for tracking dynamic targets using a mobile robot, equipped with an onboard sensor with limited field of view. The task is to obtain a continuous control policy for the mobile robot to collect sensor measurements that reduce uncertainty in the target states, measured by the target distribution entropy. We design a neural network control policy with the robot $SE(3)$ pose and the mean vector and information matrix of the joint target distribution as inputs and attention layers to handle variable numbers of targets. We also derive the gradient of the target entropy with respect to the network parameters explicitly, allowing efficient model-based policy gradient optimization.
翻译:本文提出了一种新颖的基于模型的策略梯度算法,用于利用配备有限视场机载传感器的移动机器人跟踪动态目标。任务旨在获取移动机器人的连续控制策略,使其通过收集传感器测量值来降低目标状态的不确定性(以目标分布熵衡量)。我们设计了一种神经网络控制策略,其输入包括机器人的SE(3)位姿、联合目标分布的均值向量和信息矩阵,并采用注意力层处理可变数量的目标。此外,我们显式推导了目标熵关于网络参数的梯度,从而实现高效的基于模型的策略梯度优化。