Multi-modality fusion and multi-task learning are becoming trendy in 3D autonomous driving scenario, considering robust prediction and computation budget. However, naively extending the existing framework to the domain of multi-modality multi-task learning remains ineffective and even poisonous due to the notorious modality bias and task conflict. Previous works manually coordinate the learning framework with empirical knowledge, which may lead to sub-optima. To mitigate the issue, we propose a novel yet simple multi-level gradient calibration learning framework across tasks and modalities during optimization. Specifically, the gradients, produced by the task heads and used to update the shared backbone, will be calibrated at the backbone's last layer to alleviate the task conflict. Before the calibrated gradients are further propagated to the modality branches of the backbone, their magnitudes will be calibrated again to the same level, ensuring the downstream tasks pay balanced attention to different modalities. Experiments on large-scale benchmark nuScenes demonstrate the effectiveness of the proposed method, eg, an absolute 14.4% mIoU improvement on map segmentation and 1.4% mAP improvement on 3D detection, advancing the application of 3D autonomous driving in the domain of multi-modality fusion and multi-task learning. We also discuss the links between modalities and tasks.
翻译:摘要:多模态融合与多任务学习在三维自动驾驶场景中日益流行,因其能够兼顾鲁棒预测与计算预算。然而,由于臭名昭著的模态偏差和任务冲突,直接将现有框架扩展到多模态多任务学习领域仍然效率低下甚至有害。以往的工作依赖经验知识手动协调学习框架,这可能导致次优解。为缓解此问题,我们提出一种新颖且简洁的跨任务与跨模态多级梯度校准学习框架,在优化过程中发挥作用。具体而言,由任务头生成并用于更新共享主干网络的梯度,将在主干网络的最后一层进行校准以缓解任务冲突。在这些校准后的梯度进一步传播至主干网络的模态分支之前,其幅度将再次被校准至同一水平,以确保下游任务对不同的模态给予均衡的关注。在大规模基准数据集nuScenes上的实验证明了所提方法的有效性,例如,地图分割的绝对平均交并比(mIoU)提升了14.4%,三维检测的平均精度(mAP)提升了1.4%,推动了多模态融合与多任务学习领域中三维自动驾驶的应用。我们还讨论了模态与任务之间的联系。