We seek to enable classic processing of continuous ultra-sparse spatiotemporal data generated by event-based sensors with dense machine learning models. We propose a novel hybrid pipeline composed of asynchronous sensing and synchronous processing that combines several ideas: (1) an embedding based on PointNet models -- the ALERT module -- that can continuously integrate new and dismiss old events thanks to a leakage mechanism, (2) a flexible readout of the embedded data that allows to feed any downstream model with always up-to-date features at any sampling rate, (3) exploiting the input sparsity in a patch-based approach inspired by Vision Transformer to optimize the efficiency of the method. These embeddings are then processed by a transformer model trained for object and gesture recognition. Using this approach, we achieve performances at the state-of-the-art with a lower latency than competitors. We also demonstrate that our asynchronous model can operate at any desired sampling rate.
翻译:我们旨在利用密集机器学习模型对事件型传感器生成的连续超稀疏时空数据进行经典处理。我们提出了一种由异步感知与同步处理组成的新型混合管道,融合了以下思想:(1)基于PointNet模型的嵌入模块——ALERT模块——凭借泄漏机制能够持续整合新事件并淘汰旧事件;(2)灵活的嵌入式数据读取方式,允许以任何采样频率为下游模型提供始终最新的特征;(3)借鉴Vision Transformer的补丁方法利用输入稀疏性来优化方法效率。这些嵌入经Transformer模型处理后,可训练用于物体与手势识别任务。通过该方法,我们以更低的延迟实现了与现有技术相当的性能。同时证明,我们的异步模型能够以任意所需的采样频率运行。