In the realm of reinforcement learning (RL), accounting for risk is crucial for making decisions under uncertainty, particularly in applications where safety and reliability are paramount. In this paper, we introduce a general framework on Risk-Sensitive Distributional Reinforcement Learning (RS-DisRL), with static Lipschitz Risk Measures (LRM) and general function approximation. Our framework covers a broad class of risk-sensitive RL, and facilitates analysis of the impact of estimation functions on the effectiveness of RSRL strategies and evaluation of their sample complexity. We design two innovative meta-algorithms: \texttt{RS-DisRL-M}, a model-based strategy for model-based function approximation, and \texttt{RS-DisRL-V}, a model-free approach for general value function approximation. With our novel estimation techniques via Least Squares Regression (LSR) and Maximum Likelihood Estimation (MLE) in distributional RL with augmented Markov Decision Process (MDP), we derive the first $\widetilde{\mathcal{O}}(\sqrt{K})$ dependency of the regret upper bound for RSRL with static LRM, marking a pioneering contribution towards statistically efficient algorithms in this domain.
翻译:在强化学习(RL)领域,考虑风险对于在不确定性下做出决策至关重要,尤其是在安全性和可靠性至关重要的应用中。本文提出了一个关于风险敏感分布强化学习(RS-DisRL)的通用框架,该框架采用静态Lipschitz风险度量(LRM)和通用函数逼近。我们的框架涵盖了一类广泛的风险敏感强化学习问题,并便于分析估计函数对RSRL策略有效性的影响及其样本复杂度的评估。我们设计了两种创新的元算法:\texttt{RS-DisRL-M}(一种基于模型的策略用于模型基函数逼近)和\texttt{RS-DisRL-V}(一种无模型方法用于通用值函数逼近)。通过我们在分布强化学习中结合增强马尔可夫决策过程(MDP)的最小二乘回归(LSR)和最大似然估计(MLE)的新型估计技术,我们首次推导出静态LRM下RSRL的遗憾上界具有$\widetilde{\mathcal{O}}(\sqrt{K})$依赖性,这标志着在该领域统计高效算法方面的开创性贡献。