In a reinforcement learning (RL) setting, the agent's optimal strategy heavily depends on her risk preferences and the underlying model dynamics of the training environment. These two aspects influence the agent's ability to make well-informed and time-consistent decisions when facing testing environments. In this work, we devise a framework to solve robust risk-aware RL problems where we simultaneously account for environmental uncertainty and risk with a class of dynamic robust distortion risk measures. Robustness is introduced by considering all models within a Wasserstein ball around a reference model. We estimate such dynamic robust risk measures using neural networks by making use of strictly consistent scoring functions, derive policy gradient formulae using the quantile representation of distortion risk measures, and construct an actor-critic algorithm to solve this class of robust risk-aware RL problems. We demonstrate the performance of our algorithm on a portfolio allocation example.
翻译:在强化学习(RL)环境中,智能体的最优策略严重依赖于其风险偏好以及训练环境的底层模型动态。这两个方面共同影响着智能体在面对测试环境时做出明智且时间一致决策的能力。本文设计了一个框架,用于求解一类同时考虑环境不确定性与风险的鲁棒风险感知RL问题,其中风险通过一类动态鲁棒失真风险度量进行刻画。鲁棒性通过考虑参考模型周围Wasserstein球内的所有模型而引入。我们利用严格一致评分函数,通过神经网络估计此类动态鲁棒风险度量;借助失真风险度量的分位数表示推导策略梯度公式;并构建一个actor-critic算法来求解此类鲁棒风险感知RL问题。我们在一个投资组合配置示例中展示了所提算法的性能。