Reliable quantification of epistemic and aleatoric uncertainty is of crucial importance in applications where models are trained in one environment but applied to multiple different environments, often seen in real-world applications for example, in climate science or mobility analysis. We propose a simple approach using surjective normalizing flows to identify out-of-distribution data sets in deep neural network models that can be computed in a single forward pass. The method builds on recent developments in deep uncertainty quantification and generative modeling with normalizing flows. We apply our method to a synthetic data set that has been simulated using a mechanistic model from the mobility literature and several data sets simulated from interventional distributions induced by soft and atomic interventions on that model, and demonstrate that our method can reliably discern out-of-distribution data from in-distribution data. We compare the surjective flow model to a Dirichlet process mixture model and a bijective flow and find that the surjections are a crucial component to reliably distinguish in-distribution from out-of-distribution data.
翻译:对认知不确定性和偶然不确定性进行可靠量化,在模型训练环境与多个应用环境不同的场景中至关重要——这在现实应用中屡见不鲜,例如气候科学或流动性分析领域。我们提出了一种基于满射归一化流的简洁方法,能够通过单次前向传播识别深度神经网络模型中的分布外数据集。该方法建立在深度不确定性量化与归一化流生成建模的最新进展之上。我们将该方法应用于一个由流动性领域机制模型模拟的合成数据集,以及多个通过软干预和原子干预从该模型干预分布中模拟的数据集,验证了该方法能可靠区分分布外数据与分布内数据。通过将满射流模型与狄利克雷过程混合模型及双射流模型进行对比,我们发现满射结构是可靠区分分布内与分布外数据的关键要素。