Determining the memory capacity of two-layer neural networks with m hidden neurons and input dimension d (i.e., md+m total trainable parameters), which refers to the largest size of general data the network can memorize, is a fundamental machine-learning question. For non-polynomial real analytic activation functions, such as sigmoids and smoothed rectified linear units (smoothed ReLUs), we establish a lower bound of md/2 and optimality up to a factor of approximately 2. Analogous prior results were limited to Heaviside and ReLU activations, with results for smooth activations suffering from logarithmic factors and requiring random data. To analyze the memory capacity, we examine the rank of the network's Jacobian by computing the rank of matrices involving both Hadamard powers and the Khati-Rao product. Our computation extends classical linear algebraic facts about the rank of Hadamard powers. Overall, our approach differs from previous works on memory capacity and holds promise for extending to deeper models and other architectures.
翻译:确定具有m个隐藏神经元和输入维度d(即总可训练参数为md+m)的两层神经网络的记忆容量——即网络能够记忆的最大通用数据规模——是一个基础性的机器学习问题。针对非多项式实解析激活函数(如Sigmoid和光滑修正线性单元(smoothed ReLU)),我们建立了下界md/2以及最优性(误差因子约为2)。先前的类似结果仅限于Heaviside和ReLU激活函数,而针对光滑激活函数的结果存在对数因子且需依赖随机数据。为分析记忆容量,我们通过计算涉及哈达玛幂(Hadamard powers)与Khatri-Rao积的矩阵秩,研究了网络雅可比矩阵的秩。本文的计算拓展了关于哈达玛幂矩阵秩的经典线性代数结论。总体而言,我们的方法与以往关于记忆容量的研究不同,并有望推广至更深层模型及其他架构。