Safety-critical cyber-physical systems require control strategies whose worst-case performance is robust against adversarial disturbances and modeling uncertainties. In this paper, we present a framework for approximate control and learning in partially observed systems to minimize the worst-case discounted cost over an infinite time-horizon. We model disturbances to the system as finite-valued uncertain variables with unknown probability distributions. For problems with known system dynamics, we construct a dynamic programming (DP) decomposition to compute the optimal control strategy. Our first contribution is to define information states that improve the computational tractability of this DP without loss of optimality. Then, we describe a simplification for a class of problems where the incurred cost is observable at each time-instance. Our second contribution is a definition of approximate information states that can be constructed or learned directly from observed data for problems with observable costs. We derive bounds on the performance loss of the resulting approximate control strategy.
翻译:安全关键的信息物理系统需要控制策略在面对对抗性干扰和模型不确定性时保持最坏情况性能的鲁棒性。本文提出一种用于部分可观测系统的近似控制与学习框架,旨在最小化无限时域上的最坏情况折扣代价。我们将系统干扰建模为具有未知概率分布的有限值不确定变量。对于已知系统动力学的问题,我们构建了动态规划分解以计算最优控制策略。首要贡献在于定义了信息状态,该状态能在不损失最优性的前提下提升动态规划的计算可处理性。随后,针对一类可观测到每时刻即时代价的问题,我们描述了简化方法。第二项贡献是定义了可从观测数据中直接构建或学习的近似信息状态,适用于代价可观测的问题。我们推导了由此产生的近似控制策略性能损失的理论界。