The value function plays a crucial role as a measure for the cumulative future reward an agent receives in both reinforcement learning and optimal control. It is therefore of interest to study how similar the values of neighboring states are, i.e., to investigate the continuity of the value function. We do so by providing and verifying upper bounds on the value function's modulus of continuity. Additionally, we show that the value function is always H\"older continuous under relatively weak assumptions on the underlying system and that non-differentiable value functions can be made differentiable by slightly "disturbing" the system.
翻译:价值函数在强化学习与最优控制中作为衡量智能体累积未来奖励的核心指标,研究相邻状态间价值函数的相似性(即其连续性)具有重要理论意义。本文通过提出并验证价值函数连续模的上界来实现这一目标。此外,我们证明在相对宽松的系统假设条件下,价值函数始终具备Hölder连续性,且通过轻微"扰动"系统可使不可微的价值函数变为可微函数。