Recent advancements in Distributional Reinforcement Learning (DRL) for modeling loss distributions have shown promise in developing hedging strategies in derivatives markets. A common approach in DRL involves learning the quantiles of loss distributions at specified levels using Quantile Regression (QR). This method is particularly effective in option hedging due to its direct quantile-based risk assessment, such as Value at Risk (VaR) and Conditional Value at Risk (CVaR). However, these risk measures depend on the accurate estimation of extreme quantiles in the loss distribution's tail, which can be imprecise in QR-based DRL due to the rarity and extremity of tail data, as highlighted in the literature. To address this issue, we propose EXtreme DRL (EX-DRL), which enhances extreme quantile prediction by modeling the tail of the loss distribution with a Generalized Pareto Distribution (GPD). This method introduces supplementary data to mitigate the scarcity of extreme quantile observations, thereby improving estimation accuracy through QR. Comprehensive experiments on gamma hedging options demonstrate that EX-DRL improves existing QR-based models by providing more precise estimates of extreme quantiles, thereby improving the computation and reliability of risk metrics for complex financial risk management.
翻译:近期在用于建模损失分布的分布强化学习(DRL)领域取得的进展,显示出其在衍生品市场开发对冲策略方面的潜力。DRL中的一种常见方法是使用分位数回归(QR)来学习指定水平下损失分布的分位数。由于该方法基于分位数直接进行风险评估(例如风险价值和条件风险价值),因此在期权对冲中特别有效。然而,正如文献中所强调的,这些风险度量依赖于对损失分布尾部的极端分位数的准确估计,而基于QR的DRL方法由于尾部数据的稀有性和极端性,可能导致估计不精确。为解决此问题,我们提出了极端分布强化学习(EX-DRL),该方法通过使用广义帕累托分布(GPD)对损失分布的尾部进行建模,从而增强了极端分位数的预测。该方法引入补充数据以缓解极端分位数观测值的稀缺性,进而通过QR提高估计精度。在Gamma对冲期权上的综合实验表明,EX-DRL通过提供更精确的极端分位数估计,改进了现有的基于QR的模型,从而提升了复杂金融风险管理中风险度量的计算与可靠性。