An informative measurement is the most efficient way to gain information about an unknown state. We present a first-principles derivation of a general-purpose dynamic programming algorithm that returns an optimal sequence of informative measurements by sequentially maximizing the entropy of possible measurement outcomes. This algorithm can be used by an autonomous agent or robot to decide where best to measure next, planning a path corresponding to an optimal sequence of informative measurements. The algorithm is applicable to states and controls that are either continuous or discrete, and agent dynamics that is either stochastic or deterministic; including Markov decision processes and Gaussian processes. Recent results from the fields of approximate dynamic programming and reinforcement learning, including on-line approximations such as rollout and Monte Carlo tree search, allow the measurement task to be solved in real time. The resulting solutions include non-myopic paths and measurement sequences that can generally outperform, sometimes substantially, commonly used greedy approaches. This is demonstrated for a global search task, where on-line planning for a sequence of local searches is found to reduce the number of measurements in the search by approximately half. A variant of the algorithm is derived for Gaussian processes for active sensing.
翻译:信息测量是以最高效方式获取未知状态信息的途径。我们提出了一种通用动态规划算法的第一性原理推导,该算法通过依次最大化可能测量结果的熵,返回最优信息测量序列。自主智能体或机器人可使用该算法决定下一步最佳测量位置,规划对应最优信息测量序列的路径。该算法适用于连续或离散的状态与控制,以及随机或确定性的智能体动力学模型,包括马尔可夫决策过程和高斯过程。来自近似动态规划与强化学习领域的最新成果(如滚动时域近似和蒙特卡洛树搜索等在线近似方法)使得测量任务能够实时求解。所得解包含非短视路径与测量序列,通常可显著优于常用的贪心方法——在全球搜索任务中,发现为局部搜索序列进行在线规划可将搜索测量次数减少约一半。针对主动感知,我们推导了该算法的一种高斯过程变体。