Our goal is to develop theory and algorithms for establishing fundamental limits on performance for a given task imposed by a robot's sensors. In order to achieve this, we define a quantity that captures the amount of task-relevant information provided by a sensor. Using a novel version of the generalized Fano inequality from information theory, we demonstrate that this quantity provides an upper bound on the highest achievable expected reward for one-step decision making tasks. We then extend this bound to multi-step problems via a dynamic programming approach. We present algorithms for numerically computing the resulting bounds, and demonstrate our approach on three examples: (i) the lava problem from the literature on partially observable Markov decision processes, (ii) an example with continuous state and observation spaces corresponding to a robot catching a freely-falling object, and (iii) obstacle avoidance using a depth sensor with non-Gaussian noise. We demonstrate the ability of our approach to establish strong limits on achievable performance for these problems by comparing our upper bounds with achievable lower bounds (computed by synthesizing or learning concrete control policies).
翻译:我们的目标是发展理论和算法,以建立由机器人传感器所施加的给定任务性能的基本极限。为此,我们定义了一个量,用以捕捉传感器提供的与任务相关的信息量。利用信息论中广义Fano不等式的一个新版本,我们证明该量为单步决策任务中最高可实现期望奖励提供了一个上界。随后,我们通过动态规划方法将该界限扩展到多步问题。我们提出了用于数值计算所得界限的算法,并在三个示例中展示了我们的方法:(i) 来自部分可观测马尔可夫决策过程文献的熔岩问题,(ii) 一个对应机器人接住自由下落物体的连续状态与观测空间的示例,以及(iii) 使用带有非高斯噪声的深度传感器进行避障。通过将我们的上界与可实现的下界(通过综合或学习具体控制策略计算得出)进行比较,我们展示了该方法为这些问题的可实现性能建立严格界限的能力。