We explore the control of stochastic systems with potentially continuous state and action spaces, characterized by the state dynamics $X_{t+1} = f(X_t, A_t, W_t)$. Here, $X$, $A$, and $W$ represent the state, action, and exogenous random noise processes, respectively, with $f$ denoting a known function that describes state transitions. Traditionally, the noise process $\{W_t, t \geq 0\}$ is assumed to be independent and identically distributed, with a distribution that is either fully known or can be consistently estimated. However, the occurrence of distributional shifts, typical in engineering settings, necessitates the consideration of the robustness of the policy. This paper introduces a distributionally robust stochastic control paradigm that accommodates possibly adaptive adversarial perturbation to the noise distribution within a prescribed ambiguity set. We examine two adversary models: current-action-aware and current-action-unaware, leading to different dynamic programming equations. Furthermore, we characterize the optimal finite sample minimax rates for achieving uniform learning of the robust value function across continuum states under both adversary types, considering ambiguity sets defined by $f_k$-divergence and Wasserstein distance. Finally, we demonstrate the applicability of our framework across various real-world settings.
翻译:本文研究具有潜在连续状态空间与动作空间的随机系统控制问题,其状态动力学由 $X_{t+1} = f(X_t, A_t, W_t)$ 描述。其中,$X$、$A$ 和 $W$ 分别表示状态、动作和外生随机噪声过程,$f$ 为描述状态转移的已知函数。传统上,噪声过程 $\{W_t, t \geq 0\}$ 通常被假设为独立同分布,且其分布完全已知或可被一致估计。然而,工程场景中常见的分布偏移现象使得策略的鲁棒性考量成为必要。本文提出一种分布鲁棒随机控制范式,该范式允许噪声分布在预设的模糊集内受到可能自适应的对抗性扰动。我们研究了两种对抗者模型:当前动作感知型与当前动作非感知型,二者导出了不同的动态规划方程。此外,针对由 $f_k$-散度与 Wasserstein 距离定义的模糊集,我们刻画了在两种对抗者类型下,于连续状态空间上实现鲁棒价值函数一致学习的最优有限样本极小极大速率。最后,我们展示了所提框架在多种现实场景中的适用性。