Recent advancements in Bird's Eye View (BEV) fusion for map construction have demonstrated remarkable mapping of urban environments. However, their deep and bulky architecture incurs substantial amounts of backpropagation memory and computing latency. Consequently, the problem poses an unavoidable bottleneck in constructing high-resolution (HR) BEV maps, as their large-sized features cause significant increases in costs including GPU memory consumption and computing latency, named diverging training costs issue. Affected by the problem, most existing methods adopt low-resolution (LR) BEV and struggle to estimate the precise locations of urban scene components like road lanes, and sidewalks. As the imprecision leads to risky self-driving, the diverging training costs issue has to be resolved. In this paper, we address the issue with our novel Trumpet Neural Network (TNN) mechanism. The framework utilizes LR BEV space and outputs an up-sampled semantic BEV map to create a memory-efficient pipeline. To this end, we introduce Local Restoration of BEV representation. Specifically, the up-sampled BEV representation has severely aliased, blocky signals, and thick semantic labels. Our proposed Local Restoration restores the signals and thins (or narrows down) the width of the labels. Our extensive experiments show that the TNN mechanism provides a plug-and-play memory-efficient pipeline, thereby enabling the effective estimation of real-sized (or precise) semantic labels for BEV map construction.
翻译:近年来,鸟瞰图融合技术在环境地图构建领域取得了显著进展,实现了对城市环境的高质量映射。然而,其深层且庞大的网络架构需要大量反向传播内存并产生较高计算延迟。这一缺陷在构建高分辨率鸟瞰图时形成了不可避免的瓶颈——大规模特征图会导致GPU内存消耗与计算延迟等成本显著增加,即训练成本发散问题。受此问题影响,现有方法大多采用低分辨率鸟瞰图表示,难以准确估计车道线、人行道等城市场景要素的空间位置。由于定位不精确可能引发自动驾驶风险,训练成本发散问题亟待解决。本文提出新型Trumpet神经网络机制以应对该挑战。该框架利用低分辨率鸟瞰空间生成上采样语义鸟瞰图,构建内存高效的流水线。为此,我们引入鸟瞰图表示的局部恢复方法。具体而言,上采样后的鸟瞰图表示存在严重信号混叠、块状伪影及语义标签过宽的问题。我们提出的局部恢复方法能够重建清晰信号,并细化(或收窄)标签宽度。大量实验表明,TNN机制提供了即插即用的内存高效流水线,从而能够为鸟瞰图构建任务实现真实尺度(精确)语义标签的有效估计。