Inspired by neuronal diversity in the biological neural system, a plethora of studies proposed to design novel types of artificial neurons and introduce neuronal diversity into artificial neural networks. Recently proposed quadratic neuron, which replaces the inner-product operation in conventional neurons with a quadratic one, have achieved great success in many essential tasks. Despite the promising results of quadratic neurons, there is still an unresolved issue: \textit{Is the superior performance of quadratic networks simply due to the increased parameters or due to the intrinsic expressive capability?} Without clarifying this issue, the performance of quadratic networks is always suspicious. Additionally, resolving this issue is reduced to finding killer applications of quadratic networks. In this paper, with theoretical and empirical studies, we show that quadratic networks enjoy parametric efficiency, thereby confirming that the superior performance of quadratic networks is due to the intrinsic expressive capability. This intrinsic expressive ability comes from that quadratic neurons can easily represent nonlinear interaction, while it is hard for conventional neurons. Theoretically, we derive the approximation efficiency of the quadratic network over conventional ones in terms of real space and manifolds. Moreover, from the perspective of the Barron space, we demonstrate that there exists a functional space whose functions can be approximated by quadratic networks in a dimension-free error, but the approximation error of conventional networks is dependent on dimensions. Empirically, experimental results on synthetic data, classic benchmarks, and real-world applications show that quadratic models broadly enjoy parametric efficiency, and the gain of efficiency depends on the task.
翻译:受生物神经系统中神经元多样性的启发,大量研究提出设计新型人工神经元并将神经元多样性引入人工神经网络。最近提出的二次神经元用二次内积运算替代传统神经元中的内积运算,已在许多重要任务上取得巨大成功。尽管二次神经元取得了令人鼓舞的结果,但仍有一个未解之谜:*二次网络的优越性能究竟源于参数增加,还是源于其内在表达能力?*若不澄清这一问题,二次网络的性能始终值得怀疑。此外,解决这一问题相当于为二次网络寻找关键应用。在本文中,通过理论与实证研究,我们证明二次网络具有参数效率,从而确认其优越性能源于内在表达能力。这种内在表达能力源于二次神经元能轻松表示非线性交互,而传统神经元难以做到。理论上,我们从实空间和流形的角度推导了二次网络相对于传统网络的逼近效率。此外,从Barron空间的角度,我们证明存在一个函数空间,其函数能被二次网络以无维度误差逼近,而传统网络的逼近误差依赖于维度。在实验上,基于合成数据、经典基准和实际应用的实验结果表明,二次模型普遍具有参数效率,且效率增益取决于任务。