This paper tackles the problem of mitigating catastrophic risk (which is risk with very low frequency but very high severity) in the context of a sequential decision making process. This problem is particularly challenging due to the scarcity of observations in the far tail of the distribution of cumulative costs (negative rewards). A policy gradient algorithm is developed, that we call POTPG. It is based on approximations of the tail risk derived from extreme value theory. Numerical experiments highlight the out-performance of our method over common benchmarks, relying on the empirical distribution. An application to financial risk management, more precisely to the dynamic hedging of a financial option, is presented.
翻译:本文致力于解决在序贯决策过程中缓解灾难性风险(即发生频率极低但严重性极高的风险)的问题。该问题极具挑战性,因为累积成本(负奖励)分布在极远尾部的观测数据极其稀少。我们开发了一种基于极值理论推导的尾部风险近似方法,称之为POTPG的策略梯度算法。数值实验表明,相较于依赖经验分布的常见基准方法,我们的方法表现更优。本文还给出了一个在金融风险管理领域的具体应用,更精确地说,是金融期权的动态对冲。