A Novel Spike Transformer Network for Depth Estimation from Event Cameras via Cross-modality Knowledge Distillation

Depth estimation is crucial for interpreting complex environments, especially in areas such as autonomous vehicle navigation and robotics. Nonetheless, obtaining accurate depth readings from event camera data remains a formidable challenge. Event cameras operate differently from traditional digital cameras, continuously capturing data and generating asynchronous binary spikes that encode time, location, and light intensity. Yet, the unique sampling mechanisms of event cameras render standard image based algorithms inadequate for processing spike data. This necessitates the development of innovative, spike-aware algorithms tailored for event cameras, a task compounded by the irregularity, continuity, noise, and spatial and temporal characteristics inherent in spiking data.Harnessing the strong generalization capabilities of transformer neural networks for spatiotemporal data, we propose a purely spike-driven spike transformer network for depth estimation from spiking camera data. To address performance limitations with Spiking Neural Networks (SNN), we introduce a novel single-stage cross-modality knowledge transfer framework leveraging knowledge from a large vision foundational model of artificial neural networks (ANN) (DINOv2) to enhance the performance of SNNs with limited data. Our experimental results on both synthetic and real datasets show substantial improvements over existing models, with notable gains in Absolute Relative and Square Relative errors (49% and 39.77% improvements over the benchmark model Spike-T, respectively). Besides accuracy, the proposed model also demonstrates reduced power consumptions, a critical factor for practical applications.

翻译：深度估计对于复杂环境的理解至关重要，尤其在自动驾驶车辆导航和机器人等领域。然而，从事件相机数据中获取精确的深度读数仍是一项艰巨挑战。事件相机与传统数码相机的工作方式不同，它持续捕获数据并生成异步二进制尖峰，这些尖峰编码时间、位置和光强信息。然而，事件相机独特的采样机制使得基于标准图像的算法无法处理尖峰数据。这要求针对事件相机开发创新的、感知尖峰特性的算法——而尖峰数据固有的不规则性、连续性、噪声以及时空特性进一步加剧了该任务的复杂性。利用Transformer神经网络对时空数据的强大泛化能力，我们提出了一种纯尖峰驱动的尖峰Transformer网络，用于从尖峰相机数据中进行深度估计。为解决尖峰神经网络（SNN）的性能局限，我们引入了一个新颖的单阶段跨模态知识迁移框架，利用人工神经网络（ANN）大型视觉基础模型（DINOv2）的知识，在数据有限的情况下提升SNN的性能。我们在合成数据集和真实数据集上的实验结果表明，该模型相比现有模型有显著改进，尤其在绝对相对误差和平方相对误差方面（相较于基准模型Spike-T分别提升49%和39.77%）。除精度外，所提模型还展现出更低的功耗，这是实际应用中的关键因素。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日