We present a framework for safety-critical optimal control of physical systems based on denoising diffusion probabilistic models (DDPMs). The technology of control barrier functions (CBFs), encoding desired safety constraints, is used in combination with DDPMs to plan actions by iteratively denoising trajectories through a CBF-based guided sampling procedure. At the same time, the generated trajectories are also guided to maximize a future cumulative reward representing a specific task to be optimally executed. The proposed scheme can be seen as an offline and model-based reinforcement learning algorithm resembling in its functionalities a model-predictive control optimization scheme with receding horizon in which the selected actions lead to optimal and safe trajectories.
翻译:我们提出了一种基于去噪扩散概率模型(DDPMs)的物理系统安全关键最优控制框架。该方法将编码所需安全约束的控制障碍函数(CBFs)技术与DDPMs相结合,通过基于CBF的引导采样过程迭代式地对轨迹进行去噪,从而规划动作。与此同时,生成的轨迹还被引导最大化代表待最优执行特定任务的未来累积奖励。所提出的方案可视为一种离线、基于模型的强化学习算法,其功能类似于具有滚动时域特性的模型预测控制优化方案,其中所选动作能够生成最优且安全的轨迹。