A Novel Spike Transformer Network for Depth Estimation from Event Cameras via Cross-modality Knowledge Distillation

Depth estimation is crucial for interpreting complex environments, especially in areas such as autonomous vehicle navigation and robotics. Nonetheless, obtaining accurate depth readings from event camera data remains a formidable challenge. Event cameras operate differently from traditional digital cameras, continuously capturing data and generating asynchronous binary spikes that encode time, location, and light intensity. Yet, the unique sampling mechanisms of event cameras render standard image based algorithms inadequate for processing spike data. This necessitates the development of innovative, spike-aware algorithms tailored for event cameras, a task compounded by the irregularity, continuity, noise, and spatial and temporal characteristics inherent in spiking data.Harnessing the strong generalization capabilities of transformer neural networks for spatiotemporal data, we propose a purely spike-driven spike transformer network for depth estimation from spiking camera data. To address performance limitations with Spiking Neural Networks (SNN), we introduce a novel single-stage cross-modality knowledge transfer framework leveraging knowledge from a large vision foundational model of artificial neural networks (ANN) (DINOv2) to enhance the performance of SNNs with limited data. Our experimental results on both synthetic and real datasets show substantial improvements over existing models, with notable gains in Absolute Relative and Square Relative errors (49% and 39.77% improvements over the benchmark model Spike-T, respectively). Besides accuracy, the proposed model also demonstrates reduced power consumptions, a critical factor for practical applications.

翻译：深度估计对于解析复杂环境至关重要，尤其在自动驾驶导航和机器人领域。然而，从事件相机数据中获取精确的深度信息仍是一项严峻挑战。事件相机的工作机制不同于传统数字相机：其连续采集数据并生成异步二进制尖峰信号，编码时间、位置和光强信息。但事件相机独特的采样机制使得基于图像的经典算法无法有效处理尖峰数据。这要求开发专用于事件相机的创新型尖峰感知算法，而尖峰数据固有的非规则性、连续性、噪声特性及时空特征进一步加剧了这一任务的复杂性。基于Transformer神经网络对时空数据的强泛化能力，我们提出一种纯尖峰驱动型的尖峰变换器网络，用于从尖峰相机数据中实现深度估计。为突破脉冲神经网络（SNN）的性能瓶颈，我们引入一种新颖的单阶段跨模态知识迁移框架，借助人工神经网络（ANN）大视觉基础模型（DINOv2）的知识，来增强数据受限条件下SNN的性能。在合成数据集与真实数据集上的实验结果表明，本模型相较于现有模型取得显著提升：在绝对相对误差和平方相对误差指标上，相较于基准模型Spike-T分别实现49%和39.77%的改进。除精度优势外，所提模型还展现出更低的功耗，这对实际应用至关重要。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日