Financial markets have experienced significant instabilities in recent years, creating unique challenges for trading and increasing interest in risk-averse strategies. Distributional Reinforcement Learning (RL) algorithms, which model the full distribution of returns rather than just expected values, offer a promising approach to managing market uncertainty. This paper investigates this potential by studying the effectiveness of three distributional RL algorithms for natural gas futures trading and exploring their capacity to develop risk-averse policies. Specifically, we analyze the performance and behavior of Categorical Deep Q-Network (C51), Quantile Regression Deep Q-Network (QR-DQN), and Implicit Quantile Network (IQN). To the best of our knowledge, these algorithms have never been applied in a trading context. These policies are compared against five Machine Learning (ML) baselines, using a detailed dataset provided by Predictive Layer SA, a company supplying ML-based strategies for energy trading. The main contributions of this study are as follows. (1) We demonstrate that distributional RL algorithms significantly outperform classical RL methods, with C51 achieving performance improvement of more than 32\%. (2) We show that training C51 and IQN to maximize CVaR produces risk-sensitive policies with adjustable risk aversion. Specifically, our ablation studies reveal that lower CVaR confidence levels increase risk aversion, while higher levels decrease it, offering flexible risk management options. In contrast, QR-DQN shows less predictable behavior. These findings emphasize the potential of distributional RL for developing adaptable, risk-averse trading strategies in volatile markets.
翻译:近年来金融市场经历了显著的不稳定性,这为交易带来了独特的挑战,并增加了对风险规避策略的关注。分布强化学习算法通过建模收益的完整分布而非仅关注期望值,为管理市场不确定性提供了一种前景广阔的方法。本文通过研究三种分布强化学习算法在天然气期货交易中的有效性,并探索其制定风险规避策略的能力,来深入探讨这一潜力。具体而言,我们分析了分类深度Q网络、分位数回归深度Q网络以及隐式分位数网络的性能与行为。据我们所知,这些算法此前从未在交易场景中得到应用。这些策略与五种机器学习基线方法进行了对比,所使用的详细数据集由为能源交易提供机器学习策略的公司Predictive Layer SA提供。本研究的主要贡献如下:(1)我们证明了分布强化学习算法显著优于经典强化学习方法,其中C51实现了超过32%的性能提升。(2)我们展示了通过训练C51和IQN以最大化条件风险价值可以产生具有可调风险规避程度的风险敏感策略。具体而言,我们的消融研究表明,较低的条件风险价值置信水平会增加风险规避程度,而较高水平则会降低它,从而提供了灵活的风险管理选项。相比之下,QR-DQN则表现出较不可预测的行为。这些发现强调了分布强化学习在波动市场中开发适应性、风险规避型交易策略的潜力。