This paper studies the memorization capacity of deep neural networks with ReLU activation. Specifically, we investigate the minimal size of such networks to memorize any $N$ data points in the unit ball with pairwise separation distance $δ$ and discrete labels. Most prior studies characterize the memorization capacity by the number of parameters or neurons. We generalize these results by constructing neural networks, whose width $W$ and depth $L$ satisfy $W^2L^2= \mathcal{O}(N\log(δ^{-1}))$, that can memorize any $N$ data samples. We also prove that any such networks should also satisfy the lower bound $W^2L^2=Ω(N \log(δ^{-1}))$, which implies that our construction is optimal up to logarithmic factors when $δ^{-1}$ is polynomial in $N$. Hence, we explicitly characterize the trade-off between width and depth for the memorization capacity of deep neural networks in this regime.
翻译:本文研究具有ReLU激活函数的深度神经网络的记忆容量。具体而言,我们探究此类网络在单位球内记忆任意$N$个具有成对间隔距离$δ$和离散标签的数据点所需的最小规模。现有研究大多通过参数数量或神经元数量来表征记忆容量。我们通过构建宽度$W$和深度$L$满足$W^2L^2= \mathcal{O}(N\log(δ^{-1}))$的神经网络来推广这些结果,该网络能够记忆任意$N$个数据样本。同时我们证明任何此类网络也必须满足下界$W^2L^2=Ω(N \log(δ^{-1}))$,这表明当$δ^{-1}$为$N$的多项式时,我们的构造在忽略对数因子的意义下是最优的。因此,我们在此机制下明确刻画了深度神经网络记忆容量中宽度与深度之间的权衡关系。