Active Uncertainty Reduction for Safe and Efficient Interaction Planning: A Shielding-Aware Dual Control Approach

The ability to accurately predict the opponent's behavior is central to the safety and efficiency of robotic systems in interactive settings, such as human-robot interaction and multi-robot teaming tasks. Unfortunately, robots often lack access to key information on which these predictions may hinge, such as opponent's goals, attention, and willingness to cooperate. Dual control theory addresses this challenge by treating unknown parameters of a predictive model as hidden states and inferring their values at runtime using information gathered during system operation. While able to optimally and automatically trade off exploration and exploitation, dual control is computationally intractable for general interactive motion planning. In this paper, we present a novel algorithmic approach to enable active uncertainty reduction for interactive motion planning based on the implicit dual control paradigm. Our approach relies on sampling-based approximation of stochastic dynamic programming, leading to a model predictive control problem. The resulting policy is shown to preserve the dual control effect for a broad class of predictive models with both continuous and categorical uncertainty. To ensure the safe operation of the interacting agents, we leverage a supervisory control scheme, oftentimes referred to as ``shielding'', which overrides the ego agent's dual control policy with a safety fallback strategy when a safety-critical event is imminent. We then augment the dual control framework with an improved variant of the recently proposed shielding-aware robust planning scheme, which proactively balances the nominal planning performance with the risk of high-cost emergency maneuvers triggered by low-probability opponent's behaviors. We demonstrate the efficacy of our approach with both simulated driving examples and hardware experiments using 1/10 scale autonomous vehicles.

翻译：准确预测对手行为是机器人系统在交互场景（如人机交互和多机器人协作任务）中实现安全与高效运行的核心。然而，机器人通常缺乏对这些预测至关重要的关键信息，例如对手的目标、注意力和合作意愿。双控制理论通过将预测模型的未知参数视为隐状态，并利用系统运行过程中收集的信息在线推断其值来应对这一挑战。尽管该方法能最优且自动地平衡探索与利用，但通用交互运动规划中的双控制在计算上难以处理。本文提出一种基于隐式双控制范式的新型算法方案，用于实现交互运动规划中的主动不确定性削减。该方法基于随机动态规划的采样近似，构建为模型预测控制问题。实验表明，所得策略对包含连续和分类不确定性的广泛预测模型类别均可保持双控制效应。为确保交互代理的安全运行，我们采用一种常被称为"屏蔽"的监督控制方案，当安全关键事件即将发生时，该方案会用安全回退策略覆盖自主代理的双控制策略。随后，我们通过最近提出的屏蔽感知鲁棒规划方案的改进变体增强双控制框架，该方案能主动平衡名义规划性能与由低概率对手行为触发的高成本紧急机动风险。我们通过仿真驾驶案例和使用1/10比例自动驾驶车辆的硬件实验验证了所提方法的有效性。