Deep neural networks have made huge progress in the last few decades. However, as the real-world data often exhibits a long-tailed distribution, vanilla deep models tend to be heavily biased toward the majority classes. To address this problem, state-of-the-art methods usually adopt a mixture of experts (MoE) to focus on different parts of the long-tailed distribution. Experts in these methods are with the same model depth, which neglects the fact that different classes may have different preferences to be fit by models with different depths. To this end, we propose a novel MoE-based method called Self-Heterogeneous Integration with Knowledge Excavation (SHIKE). We first propose Depth-wise Knowledge Fusion (DKF) to fuse features between different shallow parts and the deep part in one network for each expert, which makes experts more diverse in terms of representation. Based on DKF, we further propose Dynamic Knowledge Transfer (DKT) to reduce the influence of the hardest negative class that has a non-negligible impact on the tail classes in our MoE framework. As a result, the classification accuracy of long-tailed data can be significantly improved, especially for the tail classes. SHIKE achieves the state-of-the-art performance of 56.3%, 60.3%, 75.4%, and 41.9% on CIFAR100-LT (IF100), ImageNet-LT, iNaturalist 2018, and Places-LT, respectively.
翻译:深度神经网络在过去几十年中取得了巨大进展。然而,由于现实世界数据往往呈现长尾分布,传统深度模型倾向于严重偏向多数类。为解决此问题,最先进的方法通常采用混合专家模型(MoE)来关注长尾分布的不同部分。这些方法中的专家具有相同的模型深度,忽略了不同类别可能偏好于被不同深度的模型拟合的事实。为此,我们提出了一种基于MoE的新方法——自异构集成与知识挖掘(SHIKE)。首先提出深度级知识融合(DKF),在同一网络的浅层与深层之间融合特征,使各专家在表示方面更具多样性。基于DKF,我们进一步提出动态知识迁移(DKT),以减少对我们的MoE框架中尾部类具有不可忽视影响的最难负类的影响。这将显著提升长尾数据的分类精度,尤其是尾部类别。SHIKE在CIFAR100-LT(IF100)、ImageNet-LT、iNaturalist 2018和Places-LT上分别达到了56.3%、60.3%、75.4%和41.9%的最优性能。