We introduce a simple but effective method for managing risk in model-based reinforcement learning with trajectory sampling that involves probabilistic safety constraints and balancing of optimism in the face of epistemic uncertainty and pessimism in the face of aleatoric uncertainty of an ensemble of stochastic neural networks.Various experiments indicate that the separation of uncertainties is essential to performing well with data-driven MPC approaches in uncertain and safety-critical control environments.
翻译:我们提出了一种简单而有效的方法,用于在基于模型的强化学习中通过轨迹采样管理风险,该方法涉及概率安全约束,并在认知不确定性面前平衡乐观与偶然不确定性面前平衡悲观,基于一组随机神经网络的集成。多项实验表明,在不确定且安全关键的控制环境中,分离不确定性对于使用数据驱动的模型预测控制方法取得良好性能至关重要。