In safety-critical decision-making scenarios being able to identify worst-case outcomes, or dead-ends is crucial in order to develop safe and reliable policies in practice. These situations are typically rife with uncertainty due to unknown or stochastic characteristics of the environment as well as limited offline training data. As a result, the value of a decision at any time point should be based on the distribution of its anticipated effects. We propose a framework to identify worst-case decision points, by explicitly estimating distributions of the expected return of a decision. These estimates enable earlier indication of dead-ends in a manner that is tunable based on the risk tolerance of the designed task. We demonstrate the utility of Distributional Dead-end Discovery (DistDeD) in a toy domain as well as when assessing the risk of severely ill patients in the intensive care unit reaching a point where death is unavoidable. We find that DistDeD significantly improves over prior discovery approaches, providing indications of the risk 10 hours earlier on average as well as increasing detection by 20%.
翻译:在安全关键型决策场景中,能够识别最坏情况结果(即死胡同)对于在实践中制定安全可靠的策略至关重要。这些情况通常充满不确定性,原因包括环境的未知或随机特性以及有限的离线训练数据。因此,在任何时间点上的决策价值都应基于其预期效果的分布。我们提出一个框架,通过显式估计决策预期收益的分布来识别最坏情况决策点。这些估计能够以基于任务风险容忍度可调的方式更早地指示死胡同。我们在一个玩具领域以及评估重症监护室中病情严重患者达到不可避免死亡点时,展示了分布性死胡同发现(DistDeD)的实用性。我们发现,DistDeD相比之前的发现方法有显著改进,平均提前10小时提供风险指示,并将检测率提升20%。