In human decision-making tasks, individuals learn through trials and prediction errors. When individuals learn the task, some are more influenced by good outcomes, while others weigh bad outcomes more heavily. Such confirmation bias can lead to different learning effects. In this study, we propose a new algorithm in Deep Reinforcement Learning, CM-DQN, which applies the idea of different update strategies for positive or negative prediction errors, to simulate the human decision-making process when the task's states are continuous while the actions are discrete. We test in Lunar Lander environment with confirmatory, disconfirmatory bias and non-biased to observe the learning effects. Moreover, we apply the confirmation model in a multi-armed bandit problem (environment in discrete states and discrete actions), which utilizes the same idea as our proposed algorithm, as a contrast experiment to algorithmically simulate the impact of different confirmation bias in decision-making process. In both experiments, confirmatory bias indicates a better learning effect. Our code can be found here https://github.com/Patrickhshs/CM-DQN.
翻译:在人类决策任务中,个体通过试错和预测误差进行学习。当个体学习任务时,有些人更容易受到良好结果的影响,而另一些人则更看重不良结果。这种确认偏误会导致不同的学习效果。在本研究中,我们提出了一种深度强化学习新算法——CM-DQN,该算法应用了对正向或负向预测误差采用不同更新策略的思想,以模拟任务状态连续而动作离散时的人类决策过程。我们在Lunar Lander环境中测试了具有确认性偏误、否定性偏误和无偏误的模型以观察学习效果。此外,我们将确认模型应用于多臂老虎机问题(状态和动作均离散的环境),该模型采用了与我们提出算法相同的核心思想,作为对比实验从算法层面模拟不同确认偏误对决策过程的影响。两项实验均表明确认性偏误能带来更好的学习效果。我们的代码可见于 https://github.com/Patrickhshs/CM-DQN。