We extend the Q-learner in Black-Scholes (QLBS) framework by incorporating risk aversion and trading costs, and propose a novel Replication Learning of Option Pricing (RLOP) approach. Both methods are fully compatible with standard reinforcement learning algorithms and operate under market frictions. Using SPY and XOP option data, we evaluate performance along static and dynamic dimensions. Adaptive-QLBS achieves higher static pricing accuracy in implied volatility space, while RLOP delivers superior dynamic hedging performance by reducing shortfall probability. These results highlight the importance of evaluating option pricing models beyond static fit, emphasizing realized hedging outcomes.
翻译:我们通过引入风险规避与交易成本扩展了布莱克-斯科尔斯框架下的Q学习器(QLBS)模型,并提出创新的期权定价复制学习(RLOP)方法。两种方法均与标准强化学习算法完全兼容,并能在市场摩擦条件下运行。基于SPY与XOP期权数据的实证研究表明:自适应QLBS在隐含波动率空间具有更高的静态定价精度,而RLOP通过降低短缺概率实现了更优的动态对冲性能。这些结果揭示了超越静态拟合的期权定价模型评估维度,凸显了实际对冲结果的重要性。