Many sequential decision-making problems that are currently automated, such as those in manufacturing or recommender systems, operate in an environment where there is either little uncertainty, or zero risk of catastrophe. As companies and researchers attempt to deploy autonomous systems in less constrained environments, it is increasingly important that we endow sequential decision-making algorithms with the ability to reason about uncertainty and risk. In this thesis, we will address both planning and reinforcement learning (RL) approaches to sequential decision-making. In the planning setting, it is assumed that a model of the environment is provided, and a policy is optimised within that model. Reinforcement learning relies upon extensive random exploration, and therefore usually requires a simulator in which to perform training. In many real-world domains, it is impossible to construct a perfectly accurate model or simulator. Therefore, the performance of any policy is inevitably uncertain due to the incomplete knowledge about the environment. Furthermore, in stochastic domains, the outcome of any given run is also uncertain due to the inherent randomness of the environment. These two sources of uncertainty are usually classified as epistemic, and aleatoric uncertainty, respectively. The over-arching goal of this thesis is to contribute to developing algorithms that mitigate both sources of uncertainty in sequential decision-making problems. We make a number of contributions towards this goal, with a focus on model-based algorithms...
翻译:许多当前已自动化的序贯决策问题(如制造业或推荐系统中的问题)在环境不确定性较低或不存在灾难性风险的情况下运行。随着企业和研究者试图在约束较少的场景中部署自主系统,赋予序贯决策算法推理不确定性和风险的能力变得日益重要。本论文将同时探讨面向序贯决策的规划方法和强化学习方法。在规划设定中,假设环境模型已提供,并在该模型内优化策略。强化学习依赖大量随机探索,因此通常需要仿真环境进行训练。在诸多现实领域中,构建完全精确的模型或仿真器是不可能的。因此,由于对环境的不完全认知,任何策略的性能都不可避免地存在不确定性。此外,在随机领域中,由于环境固有的随机性,单次运行的结果也存在不确定性。这两类不确定性通常分别被归类为认知不确定性和偶然不确定性。本论文的总体目标是促进开发能够缓解序贯决策问题中这两类不确定性的算法。我们针对这一目标做出了多项贡献,重点关注基于模型的方法...