Quantifying predictive uncertainty of deep semantic segmentation networks is essential in safety-critical tasks. In applications like autonomous driving, where video data is available, convolutional long short-term memory networks are capable of not only providing semantic segmentations but also predicting the segmentations of the next timesteps. These models use cell states to broadcast information from previous data by taking a time series of inputs to predict one or even further steps into the future. We present a temporal postprocessing method which estimates the prediction performance of convolutional long short-term memory networks by either predicting the intersection over union of predicted and ground truth segments or classifying between intersection over union being equal to zero or greater than zero. To this end, we create temporal cell state-based input metrics per segment and investigate different models for the estimation of the predictive quality based on these metrics. We further study the influence of the number of considered cell states for the proposed metrics.
翻译:量化深度语义分割网络的预测不确定性在安全关键任务中至关重要。在自动驾驶等视频数据可用的应用中,卷积长短期记忆网络不仅能够提供语义分割结果,还能预测下一时间步的分割结果。这些模型通过细胞状态从先前数据中传播信息,利用时间序列输入来预测未来一个甚至多个时间步的结果。我们提出了一种时间后处理方法,通过预测预测分割与真实分割的交并比,或判断交并比是否等于零或大于零,来估计卷积长短期记忆网络的预测性能。为此,我们为每个分割创建基于时间细胞状态的输入度量,并基于这些度量研究不同模型对预测质量的估计。我们进一步分析了考虑细胞状态数量对所提出度量的影响。