Giving autonomous agents the ability to forecast their own outcomes and uncertainty will allow them to communicate their competencies and be used more safely. We accomplish this by using a learned world model of the agent system to forecast full agent trajectories over long time horizons. Real world systems involve significant sources of both aleatoric and epistemic uncertainty that compound and interact over time in the trajectory forecasts. We develop a deep generative world model that quantifies aleatoric uncertainty while incorporating the effects of epistemic uncertainty during the learning process. We show on two reinforcement learning problems that our uncertainty model produces calibrated outcome uncertainty estimates over the full trajectory horizon.
翻译:赋予自主智能体预测自身行为结果及其不确定性的能力,将使其能够传达自身能力边界并更安全地被使用。我们通过构建基于智能体系统的学习世界模型来实现这一目标,该模型可对智能体在长时间跨度的完整轨迹进行预测。现实世界系统存在显著的随机不确定性和认知不确定性来源,这些不确定性在轨迹预测过程中会随时间累积并相互交互。我们开发了一种深度生成式世界模型,该模型在学习过程中能量化随机不确定性,同时整合认知不确定性的影响。我们在两个强化学习问题上的实验表明,所提出的不确定性模型能够产生经过校准的完整轨迹时域结果不确定性估计。