Safe planning of an autonomous agent in interactive environments -- such as the control of a self-driving vehicle among pedestrians -- poses a major challenge as the behavior of the environment is unknown and reactive to the behavior of the autonomous agent. This coupling gives rise to interaction-driven distribution shifts where the autonomous agent's control policy may change the environment's behavior, thereby invalidating safety guarantees in existing work. Indeed, recent works have used conformal prediction (CP) to generate distribution-free safety guarantees using observed data of the environment. However, CP's assumption on data exchangeability is violated in interactive settings due to a circular dependency where a control policy update changes the environment's behavior, and vice versa. To address this gap, we propose an iterative framework that robustly maintains safety guarantees across policy updates by quantifying the potential impact of a planned policy update on the environment's behavior. We realize this via adversarially robust CP where we perform a regular CP step in each episode using observed data under the current policy, but then transfer safety guarantees across policy updates by analytically adjusting the CP result to account for distribution shifts. This adjustment is performed based on a policy-to-trajectory sensitivity analysis, resulting in a safe, episodic open-loop planner. We further conduct a contraction analysis of the system providing conditions under which both the CP results and the policy updates are guaranteed to converge. We empirically demonstrate these safety and convergence guarantees on a two-dimensional car-pedestrian and a high-dimensional quadcopter case study. To the best of our knowledge, these are the first results that provide valid safety guarantees in such interactive settings.
翻译:自主智能体在交互环境中的安全规划(例如在人车混行场景下控制自动驾驶汽车)面临重大挑战,因为环境行为既未知又对智能体行为具有反应性。这种耦合导致交互驱动的分布偏移:智能体控制策略可能改变环境行为,从而破坏现有工作的安全保证。近期研究虽利用共形预测(CP)方法通过环境观测数据生成无分布假设的安全保证,但在交互场景中,由于控制策略更新与环境行为改变之间存在循环依赖关系,CP方法关于数据可交换性的假设被违反。为解决这一问题,我们提出迭代框架,通过量化策略更新对环境行为的潜在影响,在策略更新过程中鲁棒维持安全保证。具体实现采用对抗鲁棒CP:在每个回合中,使用当前策略下的观测数据执行常规CP步骤,继而通过解析调整CP结果来应对分布偏移,将安全保证传递至后续策略更新。该调整基于策略-轨迹灵敏度分析,最终形成安全的回合制开环规划器。我们进一步对系统进行收缩分析,提出保证CP结果与策略更新均收敛的条件。通过二维车-人场景与高维四旋翼无人机案例研究,实证验证了这些安全性与收敛性保证。据我们所知,这是首个在交互场景中提供有效安全保证的研究成果。