Many real-world systems often involve physical components or operating environments with highly nonlinear and uncertain dynamics. A number of different control algorithms can be used to design optimal controllers for such systems, assuming a reasonably high-fidelity model of the actual system. However, the assumptions made on the stochastic dynamics of the model when designing the optimal controller may no longer be valid when the system is deployed in the real-world. The problem addressed by this paper is the following: Suppose we obtain an optimal trajectory by solving a control problem in the training environment, how do we ensure that the real-world system trajectory tracks this optimal trajectory with minimal amount of error in a deployment environment. In other words, we want to learn how we can adapt an optimal trained policy to distribution shifts in the environment. Distribution shifts are problematic in safety-critical systems, where a trained policy may lead to unsafe outcomes during deployment. We show that this problem can be cast as a nonlinear optimization problem that could be solved using heuristic method such as particle swarm optimization (PSO). However, if we instead consider a convex relaxation of this problem, we can learn policies that track the optimal trajectory with much better error performance, and faster computation times. We demonstrate the efficacy of our approach on tracking an optimal path using a Dubin's car model, and collision avoidance using both a linear and nonlinear model for adaptive cruise control.
翻译:许多实际系统通常包含高度非线性和不确定性动态的物理组件或运行环境。在假设实际系统具有较高保真度模型的前提下,可以使用多种不同的控制算法来设计这些系统的最优控制器。然而,当系统部署到真实环境中时,设计最优控制器时对模型随机动态所做的假设可能不再成立。本文解决的问题如下:假设通过在训练环境中求解控制问题获得了一条最优轨迹,我们如何确保实际系统轨迹在部署环境中以最小误差跟踪该最优轨迹?换言之,我们希望学习如何将经过训练的最优策略适应于环境中的分布偏移。分布偏移在安全关键系统中尤为棘手,因为经过训练的策略可能在部署期间导致不安全的结果。我们证明该问题可以表述为一个非线性优化问题,并通过粒子群优化(PSO)等启发式方法求解。然而,如果我们转而考虑该问题的凸松弛,则可以学习到以更优误差性能和更快计算速度跟踪最优轨迹的策略。我们通过使用杜宾车模型跟踪最优路径,以及使用线性和非线性模型进行自适应巡航控制的碰撞避免,证明了我们方法的有效性。