The selection of a reward function in Reinforcement Learning (RL) has garnered significant attention because of its impact on system performance. Issues of significant steady-state errors often manifest when quadratic reward functions are employed. Although absolute-value-type reward functions alleviate this problem, they tend to induce substantial fluctuations in specific system states, leading to abrupt changes. In response to this challenge, this study proposes an approach that introduces an integral term. By integrating this integral term into quadratic-type reward functions, the RL algorithm is adeptly tuned, augmenting the system's consideration of reward history, and consequently alleviates concerns related to steady-state errors. Through experiments and performance evaluations on the Adaptive Cruise Control (ACC) and lane change models, we validate that the proposed method effectively diminishes steady-state errors and does not cause significant spikes in some system states.
翻译:强化学习(RL)中奖励函数的选择因其对系统性能的影响而备受关注。当采用二次型奖励函数时,常会出现显著的稳态误差问题。虽然绝对值型奖励函数能缓解此问题,但容易导致特定系统状态产生大幅波动,引发突变。针对这一挑战,本研究提出一种引入积分项的方法。通过将积分项纳入二次型奖励函数,可对RL算法进行精细调节,增强系统对历史奖励的考量,从而减轻稳态误差问题。在自适应巡航控制(ACC)和换道模型上的实验与性能评估验证了所提方法能有效降低稳态误差,且不会在某些系统状态中引发显著的尖峰波动。