Neural-network based predictions of event properties in astro-particle physics are getting more and more common. However, in many cases the result is just utilized as a point prediction. Statistical uncertainties and coverage (1), systematic uncertainties (2) or a goodness-of-fit measure (3) are often not calculated. Here we describe a certain choice of training and network architecture that allows to incorporate all these properties into a single network model. We show that a KL-divergence objective of the joint distribution of data and labels allows to unify supervised learning and variational autoencoders (VAEs) under one umbrella of stochastic variational inference. The unification motivates an extended supervised learning scheme which allows to calculate a goodness-of-fit p-value for the neural network model. Conditional normalizing flows amortized with a neural network are crucial in this construction. We discuss how they allow to rigorously define coverage for posteriors defined jointly on a product space, e.g. $\mathbb{R}^n \times \mathcal{S}^m$, which encompasses posteriors over directions. Finally, systematic uncertainties are naturally included in the variational viewpoint. The proposed extended supervised training with amortized normalizing flows incorporates (1) coverage calculation, (2) systematics and (3) a goodness-of-fit measure in a single machine-learning model. There are no constraints on the shape of the involved distributions (e.g. Gaussianity) for these properties to hold, in fact it works with complex multi-modal distributions defined on product spaces like $\mathbb{R}^n \times \mathcal{S}^m$. We see great potential for exploiting this per-event information in event selections or for fast astronomical alerts which require uncertainty guarantees.
翻译:基于神经网络的天体粒子物理事件属性预测正变得越来越普遍。然而,许多情况下,结果仅作为点预测使用。统计不确定性与覆盖(1)、系统误差(2)或拟合优度度量(3)通常未被计算。本文描述了特定的训练与网络架构选择,使所有这些特性能够整合到单一网络模型中。我们证明,数据与标签联合分布的KL散度目标函数能够将监督学习与变分自编码器(VAEs)统一在随机变分推理的框架下。这一统一性催生了一种扩展的监督学习方案,可计算神经网络模型的拟合优度p值。基于神经网络摊销的条件归一化流在此构造中至关重要。我们讨论了它们如何严格定义乘积空间(例如$\mathbb{R}^n \times \mathcal{S}^m$)上联合后验的覆盖,这覆盖了方向的后验分布。最后,系统误差自然被纳入变分视角。所提出的基于摊销归一化流的扩展监督训练将(1)覆盖计算、(2)系统误差与(3)拟合优度度量整合到单一机器学习模型中。这些特性的成立对涉及分布的形状(如高斯性)没有约束,实际上它适用于定义在乘积空间(如$\mathbb{R}^n \times \mathcal{S}^m$)上的复杂多峰分布。我们相信,在需要不确定性保证的事件选择或快速天文警报中,利用这种逐事件信息具有巨大潜力。