Quantifying the predictive uncertainty emerged as a possible solution to common challenges like overconfidence or lack of explainability and robustness of deep neural networks, albeit one that is often computationally expensive. Many real-world applications are multi-modal in nature and hence benefit from multi-task learning. In autonomous driving, for example, the joint solution of semantic segmentation and monocular depth estimation has proven to be valuable. In this work, we first combine different uncertainty quantification methods with joint semantic segmentation and monocular depth estimation and evaluate how they perform in comparison to each other. Additionally, we reveal the benefits of multi-task learning with regard to the uncertainty quality compared to solving both tasks separately. Based on these insights, we introduce EMUFormer, a novel student-teacher distillation approach for joint semantic segmentation and monocular depth estimation as well as efficient multi-task uncertainty quantification. By implicitly leveraging the predictive uncertainties of the teacher, EMUFormer achieves new state-of-the-art results on Cityscapes and NYUv2 and additionally estimates high-quality predictive uncertainties for both tasks that are comparable or superior to a Deep Ensemble despite being an order of magnitude more efficient.
翻译:量化预测不确定性已成为解决深度神经网络过度自信、缺乏可解释性与鲁棒性等常见挑战的一种潜在方案,尽管其计算成本往往很高。许多实际应用本质上是多模态的,因此受益于多任务学习。例如,在自动驾驶中,语义分割和单目深度估计的联合求解已被证明具有重要价值。在本工作中,我们首先将不同的不确定性量化方法与联合语义分割和单目深度估计相结合,并评估它们之间的相对性能。此外,我们揭示了与单独求解两个任务相比,多任务学习在不确定性质量方面的优势。基于这些见解,我们提出了EMUFormer,一种新颖的师生蒸馏方法,用于联合语义分割和单目深度估计以及高效的多任务不确定性量化。通过隐式利用教师网络的预测不确定性,EMUFormer在Cityscapes和NYUv2数据集上取得了新的最先进成果,同时还为这两个任务估计出高质量且与深度集成方法相当或更优的预测不确定性,尽管其效率高出一个数量级。