Depth estimation based on stereo matching is a classic but popular computer vision problem, which has a wide range of real-world applications. Current stereo matching methods generally adopt the deep Siamese neural network architecture, and have achieved impressing performance by constructing feature matching cost volumes and using 3D convolutions for cost aggregation. However, most existing methods suffer from large number of parameters and slow running time due to the sequential use of 3D convolutions. In this paper, we propose Ghost-Stereo, a novel end-to-end stereo matching network. The feature extraction part of the network uses the GhostNet to form a U-shaped structure. The core of Ghost-Stereo is a GhostNet feature-based cost volume enhancement (Ghost-CVE) module and a GhostNet-inspired lightweight cost volume aggregation (Ghost-CVA) module. For the Ghost-CVE part, cost volumes are constructed and fused by the GhostNet-based features to enhance the spatial context awareness. For the Ghost-CVA part, a lightweight 3D convolution bottleneck block based on the GhostNet is proposed to reduce the computational complexity in this module. By combining with the context and geometry fusion module, a classical hourglass-shaped cost volume aggregate structure is constructed. Ghost-Stereo achieves a comparable performance than state-of-the-art real-time methods on several publicly benchmarks, and shows a better generalization ability.
翻译:基于立体匹配的深度估计是一个经典但热门的计算机视觉问题,在实际应用中具有广泛价值。当前立体匹配方法普遍采用深度孪生神经网络架构,通过构建特征匹配代价体并利用三维卷积进行代价聚合,取得了令人印象深刻的性能。然而,由于连续使用三维卷积,现有方法大多存在参数量大、运行速度慢的问题。本文提出Ghost-Stereo,一种新颖的端到端立体匹配网络。该网络的特征提取部分采用GhostNet构建U形结构。Ghost-Stereo的核心是基于GhostNet特征的代价体增强模块和受GhostNet启发的轻量化代价体聚合模块。在Ghost-CVE部分,通过基于GhostNet的特征构建并融合代价体以增强空间上下文感知能力;在Ghost-CVA部分,提出基于GhostNet的轻量化三维卷积瓶颈块以降低该模块的计算复杂度。结合上下文与几何融合模块,构建了经典的沙漏形代价体聚合结构。Ghost-Stereo在多个公开基准测试中达到了与先进实时方法相当的性能,并展现出更优的泛化能力。