This paper proposes a finite-horizon approximation scheme and introduces episodic equilibrium as a solution concept for stochastic games (SGs), where agents strategize based on the current state and episode stage. The paper also establishes an upper bound on the approximation error that decays with the episode length for both discounted and time-averaged utilities. This approach bridges the gap in the analysis of finite and infinite-horizon SGs, and provides a unifying framework to address time-averaged and discounted utilities. To show the effectiveness of the scheme, the paper presents episodic, decentralized (i.e., payoff-based), and model-free learning dynamics proven to reach (near) episodic equilibrium in broad classes of SGs, including zero-sum, identical-interest and specific general-sum SGs with switching controllers for both time-averaged and discounted utilities.
翻译:本文提出了一种有限时域近似方案,并引入片段均衡作为随机博弈(SGs)的求解概念,其中智能体基于当前状态和片段阶段制定策略。论文同时建立了近似误差的上界,该误差随片段长度增加而衰减,适用于折现效用和时间平均效用两种情形。该方案弥合了有限与无限时域随机博弈分析之间的鸿沟,并为处理时间平均效用与折现效用提供了统一框架。为验证方案有效性,本文提出了片段式、去中心化(即基于收益的)、无模型学习动力学,并证明该动力学能在广泛类别的随机博弈中达到(近似)片段均衡,包括零和博弈、共同利益博弈以及具有切换控制器的特定一般和博弈,且同时适用于时间平均效用与折现效用。