Active Uncertainty Reduction for Safe and Efficient Interaction Planning: A Shielding-Aware Dual Control Approach

The ability to accurately predict others' behavior is central to the safety and efficiency of interactive robotics. Unfortunately, robots often lack access to key information on which these predictions may hinge, such as other agents' goals, attention, and willingness to cooperate. Dual control theory addresses this challenge by treating unknown parameters of a predictive model as stochastic hidden states and inferring their values at runtime using information gathered during system operation. While able to optimally and automatically trade off exploration and exploitation, dual control is computationally intractable for general interactive motion planning. In this paper, we present a novel algorithmic approach to enable active uncertainty reduction for interactive motion planning based on the implicit dual control paradigm. Our approach relies on sampling-based approximation of stochastic dynamic programming, leading to a model predictive control problem that can be readily solved by real-time gradient-based optimization methods. The resulting policy is shown to preserve the dual control effect for a broad class of predictive models with both continuous and categorical uncertainty. To ensure the safe operation of the interacting agents, we use a runtime safety filter (also referred to as a "shielding" scheme), which overrides the robot's dual control policy with a safety fallback strategy when a safety-critical event is imminent. We then augment the dual control framework with an improved variant of the recently proposed shielding-aware robust planning scheme, which proactively balances the nominal planning performance with the risk of high-cost emergency maneuvers triggered by low-probability agent behaviors. We demonstrate the efficacy of our approach with both simulated driving studies and hardware experiments using 1/10 scale autonomous vehicles.

翻译：准确预测他人行为是交互式机器人安全性与效率的核心。然而，机器人通常无法获取这些预测所依赖的关键信息，例如其他智能体的目标、注意力及合作意愿。双重控制理论通过将预测模型中的未知参数视为随机隐状态，并在系统运行过程中利用采集的信息实时推断其值，从而应对这一挑战。尽管双重控制能够以最优方式自动权衡探索与利用，但在一般性交互运动规划中计算上难以处理。本文提出了一种新颖的算法方法，基于隐式双重控制范式实现交互运动规划的主动不确定性减少。我们的方法依赖于随机动态规划的基于采样的近似，从而形成可通过实时梯度优化方法直接求解的模型预测控制问题。所得到的策略被证明能在包含连续与分类不确定性的广泛预测模型类别中保持双重控制效应。为确保交互智能体的安全运行，我们使用运行时安全滤波器（亦称“屏蔽”方案），当安全关键事件即将发生时，该滤波器会用安全回退策略覆盖机器人的双重控制策略。随后，我们利用最近提出的屏蔽感知鲁棒规划方案的改进变体来增强双重控制框架，该方案主动平衡名义规划性能与由低概率智能体行为触发的高成本紧急操作风险。我们通过模拟驾驶研究和使用1/10比例自动驾驶车辆的硬件实验，展示了所提方法的有效性。