This paper focuses on the problem of detecting and reacting to changes in the distribution of a sensorimotor controller's observables. The key idea is the design of switching policies that can take conformal quantiles as input, which we define as conformal policy learning, that allows robots to detect distribution shifts with formal statistical guarantees. We show how to design such policies by using conformal quantiles to switch between base policies with different characteristics, e.g. safety or speed, or directly augmenting a policy observation with a quantile and training it with reinforcement learning. Theoretically, we show that such policies achieve the formal convergence guarantees in finite time. In addition, we thoroughly evaluate their advantages and limitations on two compelling use cases: simulated autonomous driving and active perception with a physical quadruped. Empirical results demonstrate that our approach outperforms five baselines. It is also the simplest of the baseline strategies besides one ablation. Being easy to use, flexible, and with formal guarantees, our work demonstrates how conformal prediction can be an effective tool for sensorimotor learning under uncertainty.
翻译:本文聚焦于传感器运动控制器可观测量分布变化的检测与应对问题。核心思想是设计能够将共形分位数作为输入的切换策略——我们将其定义为共形策略学习,使得机器人能够在正式统计保证下检测分布偏移。我们展示了如何通过以下方式设计此类策略:利用共形分位数在具有不同特性(如安全性或速度)的基础策略间进行切换,或直接将分位数增强至策略观测值中并通过强化学习进行训练。理论上,我们证明了此类策略在有限时间内可实现正式收敛保证。此外,我们通过两个引人注目的应用案例——模拟自动驾驶和物理四足机器人的主动感知——全面评估了其优势与局限性。实验结果表明,我们的方法优于五种基线模型,且是除一种消融策略外最简单的基线方案。凭借易用性、灵活性及正式保证,本研究展示了共形预测如何成为不确定性下传感器运动学习的有效工具。