Hedging American Put Options with Deep Reinforcement Learning

This article leverages deep reinforcement learning (DRL) to hedge American put options, utilizing the deep deterministic policy gradient (DDPG) method. The agents are first trained and tested with Geometric Brownian Motion (GBM) asset paths and demonstrate superior performance over traditional strategies like the Black-Scholes (BS) Delta, particularly in the presence of transaction costs. To assess the real-world applicability of DRL hedging, a second round of experiments uses a market calibrated stochastic volatility model to train DRL agents. Specifically, 80 put options across 8 symbols are collected, stochastic volatility model coefficients are calibrated for each symbol, and a DRL agent is trained for each of the 80 options by simulating paths of the respective calibrated model. Not only do DRL agents outperform the BS Delta method when testing is conducted using the same calibrated stochastic volatility model data from training, but DRL agents achieves better results when hedging the true asset path that occurred between the option sale date and the maturity. As such, not only does this study present the first DRL agents tailored for American put option hedging, but results on both simulated and empirical market testing data also suggest the optimality of DRL agents over the BS Delta method in real-world scenarios. Finally, note that this study employs a model-agnostic Chebyshev interpolation method to provide DRL agents with option prices at each time step when a stochastic volatility model is used, thereby providing a general framework for an easy extension to more complex underlying asset processes.

翻译：本文利用深度强化学习（DRL）方法对冲美式看跌期权，采用深度确定性策略梯度（DDPG）算法。首先，我们通过几何布朗运动（GBM）资产路径对智能体进行训练和测试，结果表明，在存在交易成本的情况下，其表现优于布莱克-斯科尔斯（BS）Delta等传统策略。为评估DRL对冲的实际适用性，第二轮实验采用市场校准的随机波动率模型来训练DRL智能体。具体而言，我们收集了8个标的的80份看跌期权，对每个标的的随机波动率模型系数进行校准，并通过模拟各校准模型的路径为每份期权训练一个DRL智能体。在基于训练时所用相同校准随机波动率模型数据进行测试时，DRL智能体不仅优于BS Delta方法，而且在对冲期权出售日至到期日之间的真实资产路径时也取得了更优结果。因此，本研究不仅首次提出了面向美式看跌期权对冲的DRL智能体，模拟与实证市场数据的测试结果也表明，在现实场景中DRL智能体相较于BS Delta方法具有最优性。最后，需说明的是，当采用随机波动率模型时，本研究使用模型无关的切比雪夫插值方法在每个时间步为DRL智能体提供期权价格，从而为拓展至更复杂的标的资产过程提供通用框架。