This paper examines replication portfolio construction in incomplete markets - a key problem in financial engineering with applications in pricing, hedging, balance sheet management, and energy storage planning. We model this as a two-player game between an investor and the market, where the investor makes strategic bets on future states while the market reveals outcomes. Inspired by the success of Monte Carlo Tree Search in stochastic games, we introduce an AlphaZero-based system and compare its performance to deep hedging - a widely used industry method based on gradient descent. Through theoretical analysis and experiments, we show that deep hedging struggles in environments where the $Q$-function is not subject to convexity constraints - such as those involving non-convex transaction costs, capital constraints, or regulatory limitations - converging to local optima. We construct specific market environments to highlight these limitations and demonstrate that AlphaZero consistently finds near-optimal replication strategies. On the theoretical side, we establish a connection between deep hedging and convex optimization, suggesting that its effectiveness is contingent on convexity assumptions. Our experiments further suggest that AlphaZero is more sample-efficient - an important advantage in data-scarce, overfitting-prone derivative markets.
翻译:本文研究不完全市场中的复制投资组合构建问题——这是金融工程中的一个关键问题,在定价、对冲、资产负债表管理和能源存储规划等领域具有重要应用。我们将此问题建模为投资者与市场之间的双人博弈:投资者对未来状态进行战略性押注,而市场则揭示结果。受蒙特卡洛树搜索在随机博弈中成功的启发,我们引入了一种基于AlphaZero的系统,并将其性能与深度对冲——一种基于梯度下降的行业常用方法——进行比较。通过理论分析和实验,我们证明深度对冲在$Q$函数不受凸性约束的环境(例如涉及非凸交易成本、资本约束或监管限制的场景)中表现不佳,容易收敛至局部最优解。我们构建了特定的市场环境以凸显这些局限性,并证明AlphaZero能够持续找到接近最优的复制策略。在理论层面,我们建立了深度对冲与凸优化之间的联系,表明其有效性依赖于凸性假设。我们的实验进一步表明AlphaZero具有更高的样本效率——这在数据稀缺、易发生过拟合的衍生品市场中是一个重要优势。