Our goal is to develop theory and algorithms for establishing fundamental limits on performance for a given task imposed by a robot's sensors. In order to achieve this, we define a quantity that captures the amount of task-relevant information provided by a sensor. Using a novel version of the generalized Fano inequality from information theory, we demonstrate that this quantity provides an upper bound on the highest achievable expected reward for one-step decision making tasks. We then extend this bound to multi-step problems via a dynamic programming approach. We present algorithms for numerically computing the resulting bounds, and demonstrate our approach on three examples: (i) the lava problem from the literature on partially observable Markov decision processes, (ii) an example with continuous state and observation spaces corresponding to a robot catching a freely-falling object, and (iii) obstacle avoidance using a depth sensor with non-Gaussian noise. We demonstrate the ability of our approach to establish strong limits on achievable performance for these problems by comparing our upper bounds with achievable lower bounds (computed by synthesizing or learning concrete control policies).
翻译:本文的目标是发展理论与算法,以建立传感器对机器人执行特定任务所施加的基础性能极限。为实现此目标,我们定义了一个量化传感器提供的任务相关信息量的指标。通过使用信息论中广义范诺不等式的新版本,我们证明该指标为单步决策任务中可实现的最大期望奖励提供了上界。随后,我们通过动态规划方法将此界推广至多步问题。我们提出了数值计算这些界限的算法,并在三个示例中验证了该方法:(i)来自部分可观测马尔可夫决策过程文献中的熔岩问题,(ii)对应机器人在自由落体物体捕捉问题中具有连续状态与观测空间的示例,以及(iii)使用含非高斯噪声深度传感器的避障问题。通过将我们的上界与可实现的下界(通过合成或学习具体控制策略计算得出)进行比较,我们展示了该方法在为这些问题建立可实现性能的强极限方面的能力。