It is well known that semantic segmentation neural networks (SSNNs) produce dense segmentation maps to resolve the objects' boundaries while restrict the prediction on down-sampled grids to alleviate the computational cost. A striking balance between the accuracy and the training cost of the SSNNs such as U-Net exists. We propose a spectral analysis to investigate the correlations among the resolution of the down sampled grid, the loss function and the accuracy of the SSNNs. By analyzing the network back-propagation process in frequency domain, we discover that the traditional loss function, cross-entropy, and the key features of CNN are mainly affected by the low-frequency components of segmentation labels. Our discoveries can be applied to SSNNs in several ways including (i) determining an efficient low resolution grid for resolving the segmentation maps (ii) pruning the networks by truncating the high frequency decoder features for saving computation costs, and (iii) using block-wise weak annotation for saving the labeling time. Experimental results shown in this paper agree with our spectral analysis for the networks such as DeepLab V3+ and Deep Aggregation Net (DAN).
翻译:众所周知,语义分割神经网络(SSNNs)通过生成密集的分割图来解析目标边界,同时将预测限制在下采样的网格上以降低计算成本。在U-Net等SSNN的精度与训练成本之间存在一个显著的平衡点。我们提出一种谱分析方法,用于探究下采样网格分辨率、损失函数与SSNN精度之间的相关性。通过分析频率域中的网络反向传播过程,我们发现传统损失函数(交叉熵)及CNN的关键特征主要受分割标签的低频分量影响。这一发现可应用于SSNN的多个方面,包括:(i)确定用于解析分割图的高效低分辨率网格;(ii)通过截断高频解码器特征来修剪网络以节省计算成本;(iii)采用块状弱标注方法以节省标注时间。本文的实验结果与我们对DeepLab V3+及深度聚合网络(DAN)等网络的谱分析结论一致。