Delays are inherent to most dynamical systems. Besides shifting the process in time, they can significantly affect their performance. For this reason, it is usually valuable to study the delay and account for it. Because they are dynamical systems, it is of no surprise that sequential decision-making problems such as Markov decision processes (MDP) can also be affected by delays. These processes are the foundational framework of reinforcement learning (RL), a paradigm whose goal is to create artificial agents capable of learning to maximise their utility by interacting with their environment. RL has achieved strong, sometimes astonishing, empirical results, but delays are seldom explicitly accounted for. The understanding of the impact of delay on the MDP is limited. In this dissertation, we propose to study the delay in the agent's observation of the state of the environment or in the execution of the agent's actions. We will repeatedly change our point of view on the problem to reveal some of its structure and peculiarities. A wide spectrum of delays will be considered, and potential solutions will be presented. This dissertation also aims to draw links between celebrated frameworks of the RL literature and the one of delays.
翻译:延迟是大多数动态系统的固有特性。除了导致过程在时间上发生偏移外,延迟还可能显著影响系统性能。因此,研究延迟并加以考量通常具有重要价值。作为动态系统,诸如马尔可夫决策过程(MDP)这类序列决策问题亦可能受到延迟影响。马尔可夫决策过程是强化学习(RL)的基础框架——该范式的目标在于创建能够通过与环境交互学习以最大化自身效用的智能体。强化学习已取得显著甚至令人惊叹的实证成果,但少有研究明确考虑延迟因素。目前对延迟如何影响马尔可夫决策过程的理解仍十分有限。本论文拟研究智能体对环境状态观测的延迟及其动作执行的延迟问题。我们将反复转换问题视角以揭示其部分结构特征与特殊性,考虑广泛的延迟类型并提出潜在解决方案。本论文亦旨在建立强化学习文献中经典框架与延迟研究框架之间的关联。