We consider the problem of learning control policies in discrete-time stochastic systems which guarantee that the system stabilizes within some specified stabilization region with probability~$1$. Our approach is based on the novel notion of stabilizing ranking supermartingales (sRSMs) that we introduce in this work. Our sRSMs overcome the limitation of methods proposed in previous works whose applicability is restricted to systems in which the stabilizing region cannot be left once entered under any control policy. We present a learning procedure that learns a control policy together with an sRSM that formally certifies probability~$1$ stability, both learned as neural networks. We show that this procedure can also be adapted to formally verifying that, under a given Lipschitz continuous control policy, the stochastic system stabilizes within some stabilizing region with probability~$1$. Our experimental evaluation shows that our learning procedure can successfully learn provably stabilizing policies in practice.
翻译:我们考虑在离散时间随机系统中学习控制策略的问题,这些策略能够保证系统以概率~$1$稳定在某个指定的稳定区域内。我们的方法基于本文引入的新概念——稳定排序超鞅(sRSMs)。我们的sRSMs克服了以往工作中所提方法的局限性,这些方法仅适用于稳定区域一旦进入便无法在任何控制策略下离开的系统。我们提出了一种学习过程,该过程同时学习控制策略和形式上证明概率~$1$稳定性的sRSM,两者均以神经网络形式学习。我们表明,该过程也可用于形式验证:在给定的Lipschitz连续控制策略下,随机系统以概率~$1$稳定在某个稳定区域内。我们的实验评估表明,该学习过程能在实践中成功学习可证明稳定的策略。