The activation function plays a crucial role in model optimisation, yet the optimal choice remains unclear. For example, the Sigmoid activation is the de-facto activation in balanced classification tasks, however, in imbalanced classification, it proves inappropriate due to bias towards frequent classes. In this work, we delve deeper in this phenomenon by performing a comprehensive statistical analysis in the classification and intermediate layers of both balanced and imbalanced networks and we empirically show that aligning the activation function with the data distribution, enhances the performance in both balanced and imbalanced tasks. To this end, we propose the Adaptive Parametric Activation (APA) function, a novel and versatile activation function that unifies most common activation functions under a single formula. APA can be applied in both intermediate layers and attention layers, significantly outperforming the state-of-the-art on several imbalanced benchmarks such as ImageNet-LT, iNaturalist2018, Places-LT, CIFAR100-LT and LVIS and balanced benchmarks such as ImageNet1K, COCO and V3DET. The code is available at https://github.com/kostas1515/AGLU.
翻译:激活函数在模型优化中起着至关重要的作用,然而其最优选择仍不明确。例如,Sigmoid激活函数在平衡分类任务中是事实上的标准选择,但在不平衡分类中,由于其偏向于频繁类别,被证明是不合适的。在本工作中,我们通过对平衡网络与不平衡网络的分类层及中间层进行全面统计分析,更深入地探究了这一现象,并经验性地证明了使激活函数与数据分布对齐,能够提升在平衡与不平衡任务上的性能。为此,我们提出了自适应参数化激活函数,这是一种新颖且通用的激活函数,它将最常见的激活函数统一在一个单一公式下。APA可应用于中间层和注意力层,在多个不平衡基准数据集(如ImageNet-LT、iNaturalist2018、Places-LT、CIFAR100-LT和LVIS)以及平衡基准数据集(如ImageNet1K、COCO和V3DET)上均显著优于现有最先进方法。代码可在 https://github.com/kostas1515/AGLU 获取。