DeepLab is a widely used deep neural network for semantic segmentation, whose success is attributed to its parallel architecture called atrous spatial pyramid pooling (ASPP). ASPP uses multiple atrous convolutions with different atrous rates to extract both local and global information. However, fixed values of atrous rates are used for the ASPP module, which restricts the size of its field of view. In principle, atrous rate should be a hyperparameter to change the field of view size according to the target task or dataset. However, the manipulation of atrous rate is not governed by any guidelines. This study proposes practical guidelines for obtaining an optimal atrous rate. First, an effective receptive field for semantic segmentation is introduced to analyze the inner behavior of segmentation networks. We observed that the use of ASPP module yielded a specific pattern in the effective receptive field, which was traced to reveal the module's underlying mechanism. Accordingly, we derive practical guidelines for obtaining the optimal atrous rate, which should be controlled based on the size of input image. Compared to other values, using the optimal atrous rate consistently improved the segmentation results across multiple datasets, including the STARE, CHASE_DB1, HRF, Cityscapes, and iSAID datasets.
翻译:DeepLab是广泛用于语义分割的深度神经网络,其成功归功于名为"空洞空间金字塔池化"(ASPP)的并行架构。ASPP使用多个不同空洞率的空洞卷积来提取局部和全局信息。然而,ASPP模块采用固定的空洞率值,这限制了其感受野的大小。理论上,空洞率应根据目标任务或数据集作为超参数来调整感受野大小,但空洞率的调整缺乏指导准则。本研究提出了获取最优空洞率的实用准则。首先,引入语义分割的有效感受野概念以分析分割网络的内部行为。我们观察到使用ASPP模块会在有效感受野中产生特定模式,通过追溯该模式揭示了模块的底层机制。据此,我们推导出获取最优空洞率的实用准则——空洞率应根据输入图像尺寸进行调控。与其他取值相比,采用最优空洞率在STARE、CHASE_DB1、HRF、Cityscapes和iSAID等多个数据集上均持续提升了分割结果。