Neural models based on hypercomplex algebra systems are growing and prolificating for a plethora of applications, ranging from computer vision to natural language processing. Hand in hand with their adoption, parameterized hypercomplex neural networks (PHNNs) are growing in size and no techniques have been adopted so far to control their convergence at a large scale. In this paper, we study PHNNs convergence and propose parameterized hypercomplex identity initialization (PHYDI), a method to improve their convergence at different scales, leading to more robust performance when the number of layers scales up, while also reaching the same performance with fewer iterations. We show the effectiveness of this approach in different benchmarks and with common PHNNs with ResNets- and Transformer-based architecture. The code is available at https://github.com/ispamm/PHYDI.
翻译:基于超复数代数系统的神经模型正日益普及并广泛应用于从计算机视觉到自然语言处理等诸多领域。随着参数化超复数神经网络(PHNNs)的采用,其规模不断增大,但目前尚无技术能在大规模范围内有效控制其收敛性。本文研究了PHNNs的收敛问题,并提出一种参数化超复数恒等初始化方法(PHYDI),该方法可在不同尺度上改善模型的收敛性,从而在层数增加时获得更稳健的性能,同时能以更少的迭代次数达到同等性能水平。我们在多个基准测试中验证了该方法的有效性,并将其应用于基于ResNet和Transformer架构的常见PHNNs。代码见https://github.com/ispamm/PHYDI。