High frame rate and accurate depth estimation plays an important role in several tasks crucial to robotics and automotive perception. To date, this can be achieved through ToF and LiDAR devices for indoor and outdoor applications, respectively. However, their applicability is limited by low frame rate, energy consumption, and spatial sparsity. Depth on Demand (DoD) allows for accurate temporal and spatial depth densification achieved by exploiting a high frame rate RGB sensor coupled with a potentially lower frame rate and sparse active depth sensor. Our proposal jointly enables lower energy consumption and denser shape reconstruction, by significantly reducing the streaming requirements on the depth sensor thanks to its three core stages: i) multi-modal encoding, ii) iterative multi-modal integration, and iii) depth decoding. We present extended evidence assessing the effectiveness of DoD on indoor and outdoor video datasets, covering both environment scanning and automotive perception use cases.
翻译:高帧率与精确深度估计在机器人学和汽车感知的若干关键任务中扮演着重要角色。迄今为止,这可以通过分别用于室内和室外应用的ToF和LiDAR设备实现。然而,它们的适用性受限于低帧率、高能耗以及空间稀疏性。"按需深度"(Depth on Demand, DoD)通过利用一个高帧率RGB传感器与一个可能帧率较低且稀疏的主动深度传感器相结合,实现了精确的时空深度稠密化。我们的方案通过其三个核心阶段——i) 多模态编码,ii) 迭代多模态融合,以及iii) 深度解码——显著降低了对深度传感器的流式传输要求,从而同时实现了更低的能耗和更稠密的形状重建。我们提供了扩展的实验证据,评估了DoD在室内和室外视频数据集上的有效性,涵盖了环境扫描和汽车感知两种用例。