We seek to enable classic processing of continuous ultra-sparse spatiotemporal data generated by event-based sensors with dense machine learning models. We propose a novel hybrid pipeline composed of asynchronous sensing and synchronous processing that combines several ideas: (1) an embedding based on PointNet models -- the ALERT module -- that can continuously integrate new and dismiss old events thanks to a leakage mechanism, (2) a flexible readout of the embedded data that allows to feed any downstream model with always up-to-date features at any sampling rate, (3) exploiting the input sparsity in a patch-based approach inspired by Vision Transformer to optimize the efficiency of the method. These embeddings are then processed by a transformer model trained for object and gesture recognition. Using this approach, we achieve performances at the state-of-the-art with a lower latency than competitors. We also demonstrate that our asynchronous model can operate at any desired sampling rate.
翻译:我们旨在利用密集机器学习模型,对事件型传感器生成的连续超稀疏时空数据实现经典处理。我们提出了一种由异步感知与同步处理组成的新型混合流水线,融合了以下多种思路:(1)基于PointNet模型的嵌入模块——ALERT模块——该模块借助泄漏机制可连续整合新事件并丢弃旧事件;(2)嵌入数据的灵活读取机制,能够以任意采样频率向下游模型提供始终更新的特征;(3)借鉴Vision Transformer的基于分块方法利用输入稀疏性,以优化方法效率。随后,这些嵌入数据经基于Transformer的模型处理后,用于目标识别与手势识别任务。采用该方法,我们在实现与现有方法相当的性能的同时,实现了更低的延迟。此外,我们证明了异步模型可在任意所需采样频率下稳定运行。