Convolutional neural networks (CNNs) are usually built by stacking convolutional operations layer-by-layer. Although CNN has shown strong capability to extract semantics from raw pixels, its capacity to capture spatial relationships of pixels across rows and columns of an image is not fully explored. These relationships are important to learn semantic objects with strong shape priors but weak appearance coherences, such as traffic lanes, which are often occluded or not even painted on the road surface as shown in Fig. 1 (a). In this paper, we propose Spatial CNN (SCNN), which generalizes traditional deep layer-by-layer convolutions to slice-byslice convolutions within feature maps, thus enabling message passings between pixels across rows and columns in a layer. Such SCNN is particular suitable for long continuous shape structure or large objects, with strong spatial relationship but less appearance clues, such as traffic lanes, poles, and wall. We apply SCNN on a newly released very challenging traffic lane detection dataset and Cityscapse dataset. The results show that SCNN could learn the spatial relationship for structure output and significantly improves the performance. We show that SCNN outperforms the recurrent neural network (RNN) based ReNet and MRF+CNN (MRFNet) in the lane detection dataset by 8.7% and 4.6% respectively. Moreover, our SCNN won the 1st place on the TuSimple Benchmark Lane Detection Challenge, with an accuracy of 96.53%.
翻译:卷积神经网络(CNN)通常通过逐层堆叠卷积操作构建而成。尽管CNN已展现出从原始像素中提取语义的强大能力,但其捕获图像行与列间像素空间关系的能力尚未得到充分探索。这些关系对于学习具有强形状先验但弱外观一致性的语义对象至关重要,例如交通车道线——如图1(a)所示,这类目标常被遮挡甚至未在路面实际标绘。本文提出空间卷积神经网络(SCNN),将传统的深层逐层卷积推广为特征图内的逐切片卷积,从而实现在单层内跨行跨列的像素间信息传递。此类SCNN特别适用于具有强空间关联性但外观线索较少的长连续形状结构或大型物体,如交通车道线、立柱与墙面。我们在新发布的极具挑战性的交通车道线检测数据集及Cityscapes数据集上应用SCNN。实验结果表明,SCNN能够学习结构化输出的空间关系,并显著提升性能表现。我们证明在车道线检测数据集中,SCNN分别以8.7%和4.6%的优势超越了基于循环神经网络(RNN)的ReNet及MRF+CNN(MRFNet)。此外,我们的SCNN以96.53%的准确率在TuSimple基准车道检测挑战赛中荣获第一名。