We consider the idealized setting of gradient flow on the population risk for infinitely wide two-layer ReLU neural networks (without bias), and study the effect of symmetries on the learned parameters and predictors. We first describe a general class of symmetries which, when satisfied by the target function $f^*$ and the input distribution, are preserved by the dynamics. We then study more specific cases. When $f^*$ is odd, we show that the dynamics of the predictor reduces to that of a (non-linearly parameterized) linear predictor, and its exponential convergence can be guaranteed. When $f^*$ has a low-dimensional structure, we prove that the gradient flow PDE reduces to a lower-dimensional PDE. Furthermore, we present informal and numerical arguments that suggest that the input neurons align with the lower-dimensional structure of the problem.
翻译:我们考虑无穷宽两层ReLU神经网络(无偏置)在总体风险上的梯度流理想化设置,并研究对称性对学习参数和预测器的影响。首先描述一类一般的对称性,当目标函数$f^*$和输入分布满足这些对称性时,动力学将保持这些对称性。随后研究更具体的情况。当$f^*$为奇函数时,我们证明预测器的动力学简化为(非线性参数化的)线性预测器的动力学,并可保证其指数收敛。当$f^*$具有低维结构时,我们证明梯度流偏微分方程简化为低维偏微分方程。此外,我们通过非正式论证和数值实验表明,输入神经元倾向于与问题的低维结构对齐。