The AllInOne training paradigm squeezes a wide range of tasks into a unified model in a multi-task learning manner. However, optimization in multi-task learning is more challenge than single-task learning, as the gradient norm from different tasks may vary greatly, making the backbone overly biased towards one specific task. To address this issue, we propose the task-level backbone-oriented gradient clip paradigm, compared with the vanilla gradient clip method, it has two points of emphasis:1) gradient clip is performed independently for each task. 2) backbone gradients generated from each task are rescaled to the same norm scale. Based on the experimental results, we argue that the task-level backbone-oriented gradient clip paradigm can relieve the gradient bias problem to some extent. We also propose a novel multi-branch data augmentation strategy where conflict augmentations are placed in different branches. Our approach has been shown to be effective and finally achieve 1st place in the Leaderboard A and 2nd place in the Leaderboard B of the CVPR2023 Foundation Model Challenge. It's worth noting that instead of evaluating all three tasks(detection, segmentation and fine-grained classification) in Leaderboard A, the segmentation task is not evaluated in Leaderboard B, in which our team has a huge advantage.
翻译:AllInOne训练范式以多任务学习方式将广泛的任务压缩到统一模型中。然而,多任务学习中的优化比单任务学习更具挑战性,因为不同任务的梯度范数可能存在显著差异,导致主干网络过度偏向某一特定任务。为解决此问题,我们提出了任务级主干网络导向的梯度裁剪范式,与原始梯度裁剪方法相比,其强调两点:1)对每个任务独立执行梯度裁剪;2)将每个任务产生的主干梯度重新缩放到相同的范数尺度。实验结果表明,任务级主干网络导向的梯度裁剪范式能在一定程度上缓解梯度偏差问题。我们还提出了一种新颖的多分支数据增强策略,将冲突性增强放置在不同分支中。我们的方法被证明是有效的,最终在CVPR2023基础模型挑战赛中荣获排行榜A第一名和排行榜B第二名。值得注意的是,排行榜A仅评估了检测、分割与细粒度分类三项任务中的两项(分割任务未纳入排行榜B评估),而我们在分割任务上具有显著优势。