The sequential nature of decision-making in financial asset trading aligns naturally with the reinforcement learning (RL) framework, making RL a common approach in this domain. However, the low signal-to-noise ratio in financial markets results in noisy estimates of environment components, including the reward function, which hinders effective policy learning by RL agents. Given the critical importance of reward function design in RL problems, this paper introduces a novel and more robust reward function by leveraging imitation learning, where a trend labeling algorithm acts as an expert. We integrate imitation (expert's) feedback with reinforcement (agent's) feedback in a model-free RL algorithm, effectively embedding the imitation learning problem within the RL paradigm to handle the stochasticity of reward signals. Empirical results demonstrate that this novel approach improves financial performance metrics compared to traditional benchmarks and RL agents trained solely using reinforcement feedback.
翻译:金融资产交易中决策过程的时序特性与强化学习框架天然契合,使得强化学习成为该领域的常用方法。然而,金融市场中较低的信噪比导致对环境各组成部分(包括奖励函数)的估计存在噪声,这阻碍了强化学习智能体进行有效的策略学习。鉴于奖励函数设计在强化学习问题中的关键重要性,本文通过利用模仿学习引入了一种新颖且更稳健的奖励函数,其中趋势标注算法充当专家角色。我们在一种无模型强化学习算法中,将模仿(专家)反馈与强化(智能体)反馈相结合,从而将模仿学习问题有效嵌入强化学习范式,以处理奖励信号的随机性。实证结果表明,相较于传统基准方法以及仅使用强化反馈训练的强化学习智能体,这种新方法提升了金融绩效指标。