Classical neural networks with random initialization famously behave as Gaussian processes in the limit of many neurons, which allows one to completely characterize their training and generalization behavior. No such general understanding exists for quantum neural networks (QNNs), which -- outside of certain special cases -- are known to not behave as Gaussian processes when randomly initialized. We here prove that QNNs and their first two derivatives instead generally form what we call "Wishart processes," where certain algebraic properties of the network determine the hyperparameters of the process. This Wishart process description allows us to, for the first time: give necessary and sufficient conditions for a QNN architecture to have a Gaussian process limit; calculate the full gradient distribution, generalizing previously known barren plateau results; and calculate the local minima distribution of algebraically constrained QNNs. Our unified framework suggests a certain simple operational definition for the "trainability" of a given QNN model using a newly introduced, experimentally accessible quantity we call the "degrees of freedom" of the network architecture.
翻译:经典神经网络在随机初始化时,当神经元数量趋于无穷大时,其行为表现为高斯过程,这一特性使得我们能够完全刻画其训练与泛化行为。然而,对于量子神经网络(QNNs),目前尚缺乏这种普遍性的理解——除某些特殊情况外,已知随机初始化的QNNs并不表现为高斯过程。本文证明,QNNs及其一阶和二阶导数通常形成我们称之为“Wishart过程”的结构,其中网络的特定代数性质决定了该过程的超参数。这一Wishart过程描述首次使我们能够:给出QNN架构具有高斯过程极限的充分必要条件;计算完整的梯度分布,从而推广先前已知的贫瘠高原结果;以及计算代数约束QNNs的局部极小值分布。我们的统一框架提出了一种基于新引入且实验可测的“网络架构自由度”量的简单操作化定义,用以刻画给定QNN模型的“可训练性”。