FATE: Pillar Encoding and Frequency-Aware Training for Event-Based Object Detection

Event cameras are bio-inspired sensors that asynchronously capture logarithmic intensity changes, offering inherent advantages in high-speed and high-dynamic-range scenarios. However, the sparse and asynchronous nature of event streams poses a fundamental challenge for modern deep learning architectures. To enable compatibility with standard models, most existing approaches partition the accumulation window into fixed temporal sub-bins. While effective for spatial processing, this internal discretization discards fine-grained temporal structure and constrains inference to the low temporal frequencies imposed by training supervision. To address this limitation, we propose FATE, a unified framework built upon a novel Pillar Encoding (PE). While operating over discrete macro-accumulation windows dictated by the target frequency, PE avoids internal temporal sub-binning. It organizes events into spatial pillars and approximates their intra-window evolution via projection onto a continuous-time orthogonal polynomial basis. This formulation yields an L2-optimal representation that retains rich temporal dynamics in a dense pseudo-image, mitigating information loss under sparse event conditions. To fully leverage this representation, we introduce Frequency-Aware Training (FAT), a soft mean-teacher curriculum that generates temporally dense pseudo-labels, effectively bridging the mismatch between low-frequency supervision and high-frequency inference. Extensive experiments demonstrate that FATE generalizes across architectural paradigms and consistently outperforms strong baselines. It enables robust object detection at high temporal resolutions up to 200 Hz, while incurring minimal overhead in parameter count and inference latency

翻译：摘要：事件相机是一种受生物启发的传感器，能以异步方式捕捉对数光强变化，在高速与高动态范围场景中具有天然优势。然而，事件流稀疏且异步的特性对现代深度学习架构构成了根本性挑战。为实现与标准模型的兼容性，现有方法通常将累积窗口划分为固定时间子区间。尽管这种内部离散化方法有利于空间处理，但会丢失精细的时间结构，并将推理限制在训练监督所施加的低时间频率范围内。针对这一局限，我们提出FATE——一种基于新型柱编码（Pillar Encoding, PE）的统一框架。该框架在目标频率决定的高散宏观累积窗口内运行时，避免了内部时间子区间划分，而是将事件组织为空间柱体，并通过连续时间正交多项式基的投影近似其窗口内演化。该公式可生成保留丰富时间动态特性的L2最优伪图像表征，从而在稀疏事件条件下缓解信息损失。为充分利用该表征，我们引入频率感知训练（Frequency-Aware Training, FAT），这是一种软均值教师课程学习策略，可生成时间密集的伪标签，有效弥合低频监督与高频推理之间的差距。大量实验证明，FATE能跨架构范式泛化，且持续优于强基线方法。该方法可在高达200Hz的时间分辨率下实现鲁棒的目标检测，同时参数数量与推理延迟开销极低。