We investigate the sample complexity of bounded two-layer neural networks using different activation functions. In particular, we consider the class $$ \mathcal{H} = \left\{\textbf{x}\mapsto \langle \textbf{v}, \sigma \circ W\textbf{b} + \textbf{b} \rangle : \textbf{b}\in\mathbb{R}^d, W \in \mathbb{R}^{\mathcal{T}\times d}, \textbf{v} \in \mathbb{R}^{\mathcal{T}}\right\} $$ where the spectral norm of $W$ and $\textbf{v}$ is bounded by $O(1)$, the Frobenius norm of $W$ is bounded from its initialization by $R > 0$, and $\sigma$ is a Lipschitz activation function. We prove that if $\sigma$ is element-wise, then the sample complexity of $\mathcal{H}$ has only logarithmic dependency in width and that this complexity is tight, up to logarithmic factors. We further show that the element-wise property of $\sigma$ is essential for a logarithmic dependency bound in width, in the sense that there exist non-element-wise activation functions whose sample complexity is linear in width, for widths that can be up to exponential in the input dimension. For the upper bound, we use the recent approach for norm-based bounds named Approximate Description Length (ADL) by arXiv:1910.05697. We further develop new techniques and tools for this approach that will hopefully inspire future works.
翻译:我们研究使用不同激活函数的有界双层神经网络的样本复杂度。具体而言,考虑如下函数类:$$ \mathcal{H} = \left\{\textbf{x}\mapsto \langle \textbf{v}, \sigma \circ W\textbf{b} + \textbf{b} \rangle : \textbf{b}\in\mathbb{R}^d, W \in \mathbb{R}^{\mathcal{T}\times d}, \textbf{v} \in \mathbb{R}^{\mathcal{T}}\right\} $$ 其中$W$和$\textbf{v}$的谱范数以$O(1)$为界,$W$的Frobenius范数相对于其初始化的偏差受$R > 0$约束,且$\sigma$为Lipschitz激活函数。我们证明:若$\sigma$为逐元素激活函数,则$\mathcal{H}$的样本复杂度仅与宽度呈对数依赖关系,且该复杂度在忽略对数因子意义下是紧的。进一步,我们证明$\sigma$的逐元素性质是实现宽度对数依赖上界的关键——存在非逐元素激活函数,其样本复杂度与宽度呈线性关系(宽度可高达输入维数的指数级)。在上界证明中,我们采用arXiv:1910.05697提出的基于范数的近似描述长度(ADL)方法,并为此方法开发了新的技术与工具,以期为后续研究提供启发。