We address the synthesis of control policies for unknown discrete-time stochastic dynamical systems to satisfy temporal logic objectives. We present a data-driven, abstraction-based control framework that integrates online learning with novel incremental game-solving. Under appropriate continuity assumptions, our method abstracts the system dynamics into a finite stochastic (2.5-player) game graph derived from data. Given a requirement over time on this graph, we compute the winning region -- i.e., the set of initial states from which the objective is satisfiable -- in the resulting game, together with a corresponding control policy. Our main contribution is the construction of abstractions, winning regions and control policies \emph{incrementally}, as data about the system dynamics accumulates. Concretely, our algorithm refines under- and over-approximations of reachable sets for each state-action pair as new data samples arrive. These refinements induce structural modifications in the game graph abstraction -- such as the addition or removal of nodes and edges -- which in turn modify the winning region. Crucially, we show that these updates are inherently monotonic: under-approximations only grow, over-approximations only shrink, and the winning region only expands. We exploit this monotonicity by defining an objective-induced ranking function on the nodes of the abstract game that increases monotonically as new data samples are incorporated. These ranks underpin our novel incremental game-solving algorithm, which employs customized gadgets (DAG-like subgames) within a rank-lifting algorithm to efficiently update the winning region. Numerical case studies demonstrate significant computational savings compared to the baseline approach, which re-solves the entire game from scratch whenever new data samples arrive.
翻译:暂无翻译