The use of reinforcement learning algorithms in financial trading is becoming increasingly prevalent. However, the autonomous nature of these algorithms can lead to unexpected outcomes that deviate from traditional game-theoretical predictions and may even destabilize markets. In this study, we examine a scenario in which two autonomous agents, modeled with Double Deep Q-Learning, learn to liquidate the same asset optimally in the presence of market impact, using the Almgren-Chriss (2000) framework. Our results show that the strategies learned by the agents deviate significantly from the Nash equilibrium of the corresponding market impact game. Notably, the learned strategies exhibit tacit collusion, closely aligning with the Pareto-optimal solution. We further explore how different levels of market volatility influence the agents' performance and the equilibria they discover, including scenarios where volatility differs between the training and testing phases.
翻译:强化学习算法在金融交易中的应用日益普遍。然而,这些算法的自主性可能导致偏离传统博弈论预测的意外结果,甚至可能破坏市场稳定。本研究考察了在Almgren-Chriss(2000)框架下,两个采用双重深度Q学习建模的自主智能体如何在存在市场冲击的情况下学习最优清算同一资产的场景。结果表明,智能体习得的策略显著偏离了相应市场冲击博弈的纳什均衡。值得注意的是,习得策略表现出默示共谋特征,与帕累托最优解高度吻合。我们进一步探究了不同市场波动水平如何影响智能体表现及其发现的均衡,包括训练与测试阶段波动率存在差异的情景。