Optimizing Deep Reinforcement Learning for American Put Option Hedging

This paper contributes to the existing literature on hedging American options with Deep Reinforcement Learning (DRL). The study first investigates hyperparameter impact on hedging performance, considering learning rates, training episodes, neural network architectures, training steps, and transaction cost penalty functions. Results highlight the importance of avoiding certain combinations, such as high learning rates with a high number of training episodes or low learning rates with few training episodes and emphasize the significance of utilizing moderate values for optimal outcomes. Additionally, the paper warns against excessive training steps to prevent instability and demonstrates the superiority of a quadratic transaction cost penalty function over a linear version. This study then expands upon the work of Pickard et al. (2024), who utilize a Chebyshev interpolation option pricing method to train DRL agents with market calibrated stochastic volatility models. While the results of Pickard et al. (2024) showed that these DRL agents achieve satisfactory performance on empirical asset paths, this study introduces a novel approach where new agents at weekly intervals to newly calibrated stochastic volatility models. Results show DRL agents re-trained using weekly market data surpass the performance of those trained solely on the sale date. Furthermore, the paper demonstrates that both single-train and weekly-train DRL agents outperform the Black-Scholes Delta method at transaction costs of 1% and 3%. This practical relevance suggests that practitioners can leverage readily available market data to train DRL agents for effective hedging of options in their portfolios.

翻译：本文对现有关于使用深度强化学习（DRL）对冲美式期权的文献作出贡献。研究首先探讨超参数对对冲性能的影响，涵盖学习率、训练轮次、神经网络架构、训练步数及交易成本惩罚函数。结果表明，避免某些组合（如高学习率与高训练轮次，或低学习率与低训练轮次）至关重要，并强调使用适中参数值对实现最优结果的重要性。此外，本文警示过度训练步数可能引发不稳定性，并证明二次交易成本惩罚函数优于线性版本。随后，本研究扩展了Pickard等人（2024）的工作——其利用切比雪夫插值期权定价方法，以市场校准的随机波动率模型训练DRL智能体。尽管Pickard等人（2024）的结果显示这些DRL智能体在经验资产路径上表现良好，但本文提出一种新方法：每周使用新校准的随机波动率模型重新训练智能体。结果表明，基于周度市场数据重新训练的DRL智能体，其性能优于仅基于初始日数据训练的智能体。此外，论文证明，在1%和3%的交易成本水平下，单次训练和每周训练两种DRL智能体均优于布莱克-舒尔斯Delta对冲方法。这一实践意义表明，从业者可利用现成的市场数据训练DRL智能体，有效对冲其投资组合中的期权风险。