Input aggregation is a simple technique used by state-of-the-art LiDAR 3D object detectors to improve detection. However, increasing aggregation is known to have diminishing returns and even performance degradation, due to objects responding differently to the number of aggregated frames. To address this limitation, we propose an efficient adaptive method, which we call Variable Aggregation Detection (VADet). Instead of aggregating the entire scene using a fixed number of frames, VADet performs aggregation per object, with the number of frames determined by an object's observed properties, such as speed and point density. VADet thus reduces the inherent trade-offs of fixed aggregation and is not architecture specific. To demonstrate its benefits, we apply VADet to three popular single-stage detectors and achieve state-of-the-art performance on the Waymo dataset.
翻译:输入聚合是当前最先进的LiDAR三维目标检测器用于提升检测性能的常用技术。然而,增加聚合帧数会因不同目标对聚合帧数的响应差异而产生收益递减甚至性能下降的问题。为突破这一局限,我们提出一种高效的自适应方法,称为可变聚合检测(VADet)。该方法摒弃对整场景采用固定帧数聚合的策略,转而根据目标的运动速度、点云密度等观测属性动态确定各目标所需的聚合帧数,实现按目标自适应的聚合。VADet从而缓解了固定聚合方式固有的性能权衡问题,且不依赖于特定网络架构。为验证其优势,我们将VADet应用于三种主流单阶段检测器,在Waymo数据集上取得了最先进的性能表现。