In this work we propose a coverage planning control approach which allows a mobile agent, equipped with a controllable sensor (i.e., a camera) with limited sensing domain (i.e., finite sensing range and angle of view), to cover the surface area of an object of interest. The proposed approach integrates ray-tracing into the coverage planning process, thus allowing the agent to identify which parts of the scene are visible at any point in time. The problem of integrated ray-tracing and coverage planning control is first formulated as a constrained optimal control problem (OCP), which aims at determining the agent's optimal control inputs over a finite planning horizon, that minimize the coverage time. Efficiently solving the resulting OCP is however very challenging due to non-convex and non-linear visibility constraints. To overcome this limitation, the problem is converted into a Markov decision process (MDP) which is then solved using reinforcement learning. In particular, we show that a controller which follows an optimal control law can be learned using off-policy temporal-difference control (i.e., Q-learning). Extensive numerical experiments demonstrate the effectiveness of the proposed approach for various configurations of the agent and the object of interest.
翻译:本文提出了一种覆盖规划控制方法,使得配备具有有限感知域(即有限感知范围和视角)的可控传感器(如摄像头)的移动智能体,能够覆盖感兴趣物体的表面区域。该方法将光线追踪集成到覆盖规划过程中,从而使智能体能够实时识别场景中哪些部分可见。首先将集成光线追踪与覆盖规划控制问题表述为一个约束最优控制问题,该问题旨在确定智能体在有限规划范围内的最优控制输入,以最小化覆盖时间。然而,由于存在非凸且非线性的可见性约束,高效求解该最优控制问题极具挑战性。为克服这一局限,将问题转化为一个马尔可夫决策过程,并使用强化学习进行求解。我们特别证明,可以通过离策略时序差分控制(即Q学习)学习一个遵循最优控制律的控制器。大量数值实验验证了所提方法在智能体与感兴趣物体的各种配置下的有效性。