Deep model-based reinforcement learning methods offer a conceptually simple approach to the decision-making and control problem: use learning for the purpose of estimating an approximate dynamics model, and offload the rest of the work to classical trajectory optimization. However, this combination has a number of empirical shortcomings, limiting the usefulness of model-based methods in practice. The dual purpose of this thesis is to study the reasons for these shortcomings and to propose solutions for the uncovered problems. Along the way, we highlight how inference techniques from the contemporary generative modeling toolbox, including beam search, classifier-guided sampling, and image inpainting, can be reinterpreted as viable planning strategies for reinforcement learning problems.
翻译:基于深度模型的强化学习方法为决策与控制问题提供了一种概念上简单的解决方案:利用学习来估计近似动力学模型,并将其余工作交由经典轨迹优化处理。然而,这种组合在实际应用中存在若干经验性缺陷,限制了基于模型的方法的实用性。本论文的双重目标在于研究这些缺陷的成因,并为所发现的问题提出解决方案。在此过程中,我们着重阐释当代生成建模工具箱中的推理技术(包括束搜索、分类器引导采样和图像修复)如何能够被重新解读为强化学习问题中可行的规划策略。