Group regression is commonly used in 3D object detection to predict box parameters of similar classes in a joint head, aiming to benefit from similarities while separating highly dissimilar classes. For query-based perception methods, this has, so far, not been feasible. We close this gap and present a method to incorporate multi-class group regression, especially designed for the 3D domain in the context of autonomous driving, into existing attention and query-based perception approaches. We enhance a transformer based joint object detection and tracking model with this approach, and thoroughly evaluate its behavior and performance. For group regression, the classes of the nuScenes dataset are divided into six groups of similar shape and prevalence, each being regressed by a dedicated head. We show that the proposed method is applicable to many existing transformer based perception approaches and can bring potential benefits. The behavior of query group regression is thoroughly analyzed in comparison to a unified regression head, e.g. in terms of class-switching behavior and distribution of the output parameters. The proposed method offers many possibilities for further research, such as in the direction of deep multi-hypotheses tracking.
翻译:分组回归通常用于三维目标检测中,通过联合头部预测相似类别的边界框参数,旨在利用类别间的相似性同时分离高度不相似的类别。然而,迄今为止,这种方法尚未适用于基于查询的感知方法。我们填补了这一空白,并提出了一种方法,将多类别分组回归(特别是为自动驾驶场景中的三维领域设计)融入现有的基于注意力机制和查询的感知方法中。我们通过这种方法增强了基于Transformer的联合目标检测与跟踪模型,并对其行为和性能进行了全面评估。在分组回归中,nuScenes数据集的类别被划分为六个具有相似形状和出现频率的组别,每个组别由一个专用头部进行回归。我们证明了所提方法可适用于许多现有的基于Transformer的感知方法,并可能带来潜在优势。通过对比统一回归头部,我们深入分析了查询分组回归的行为,例如在类别切换行为和输出参数分布方面。所提方法为进一步研究提供了多种可能性,例如在深度多假设跟踪方向上的探索。