Future reward estimation is a core component of reinforcement learning agents; i.e., Q-value and state-value functions, predicting an agent's sum of future rewards. Their scalar output, however, obfuscates when or what individual future rewards an agent may expect to receive. We address this by modifying an agent's future reward estimator to predict their next N expected rewards, referred to as Temporal Reward Decomposition (TRD). This unlocks novel explanations of agent behaviour. Through TRD we can: estimate when an agent may expect to receive a reward, the value of the reward and the agent's confidence in receiving it; measure an input feature's temporal importance to the agent's action decisions; and predict the influence of different actions on future rewards. Furthermore, we show that DQN agents trained on Atari environments can be efficiently retrained to incorporate TRD with minimal impact on performance.
翻译:未来奖励估计是强化学习智能体的核心组成部分,例如Q值和状态价值函数,它们预测智能体未来奖励的总和。然而,这些函数的标量输出掩盖了智能体预期何时或获得何种具体奖励。我们通过修改智能体的未来奖励估计器,使其预测接下来N个预期奖励(称为时序奖励分解,TRD)来解决这一问题。这为智能体行为提供了新颖的解释方法。通过TRD,我们能够:估计智能体预期获得奖励的时间、奖励的价值以及智能体获得奖励的置信度;衡量输入特征对智能体决策的时序重要性;并预测不同行动对未来奖励的影响。此外,我们证明在Atari环境中训练的DQN智能体可以高效地重新训练以整合TRD,且对性能影响极小。