Strategy Synthesis for Zero-Sum Neuro-Symbolic Concurrent Stochastic Games

Neuro-symbolic approaches to artificial intelligence, which combine neural networks with classical symbolic techniques, are growing in prominence, necessitating formal approaches to reason about their correctness. We propose a novel modelling formalism called neuro-symbolic concurrent stochastic games (NS-CSGs), which comprise two probabilistic finite-state agents interacting in a shared continuous-state environment. Each agent observes the environment using a neural perception mechanism, which converts inputs such as images into symbolic percepts, and makes decisions symbolically. We focus on the class of NS-CSGs with Borel state spaces and prove the existence and measurability of the value function for zero-sum discounted cumulative rewards under piecewise-constant restrictions on the components of this class of models. To compute values and synthesise strategies, we present, for the first time, practical value iteration (VI) and policy iteration (PI) algorithms to solve this new subclass of continuous-state CSGs. These require a finite decomposition of the environment induced by the neural perception mechanisms of the agents and rely on finite abstract representations of value functions and strategies closed under VI or PI. First, we introduce a Borel measurable piecewise-constant (B-PWC) representation of value functions, extend minimax backups to this representation and propose a value iteration algorithm called B-PWC VI. Second, we introduce two novel representations for the value functions and strategies, constant-piecewise-linear (CON-PWL) and constant-piecewise-constant (CON-PWC) respectively, and propose Minimax-action-free PI by extending a recent PI method based on alternating player choices for finite state spaces to Borel state spaces, which does not require normal-form games to be solved.

翻译：神经符号人工智能方法将神经网络与经典符号技术相结合，其重要性日益增长，因此需要形式化方法来推理其正确性。我们提出了一种称为神经符号并发随机博弈（NS-CSG）的新型建模形式，该模型包含两个在共享连续状态环境中交互的概率有限状态智能体。每个智能体通过神经感知机制观察环境，该机制将图像等输入转换为符号感知，并以符号方式做出决策。我们关注具有Borel状态空间的NS-CSG类，并在该类模型组件满足分段常数限制的条件下，证明了零和折扣累积奖励值函数的存在性与可测性。为计算值并综合策略，我们首次提出了实用的值迭代（VI）与策略迭代（PI）算法，以求解此类连续状态CSG的新子类。这些算法需要基于智能体神经感知机制诱导的环境有限分解，并依赖于在VI或PI下封闭的值函数与策略的有限抽象表示。首先，我们引入了值函数的Borel可测分段常数（B-PWC）表示，将极小极大备份扩展至此表示，并提出了一种称为B-PWC VI的值迭代算法。其次，我们为值函数和策略分别引入了两种新颖表示：常数分段线性（CON-PWL）与常数分段常数（CON-PWC），并通过将近期一种基于有限状态空间交替玩家选择的PI方法扩展至Borel状态空间，提出了无需极小极大行动的PI方法，该方法无需求解正规形式博弈。