We propose an optimistic estimate to evaluate the best possible fitting performance of nonlinear models. It yields an optimistic sample size that quantifies the smallest possible sample size to fit/recover a target function using a nonlinear model. We estimate the optimistic sample sizes for matrix factorization models, deep models, and deep neural networks (DNNs) with fully-connected or convolutional architecture. For each nonlinear model, our estimates predict a specific subset of targets that can be fitted at overparameterization, which are confirmed by our experiments. Our optimistic estimate reveals two special properties of the DNN models -- free expressiveness in width and costly expressiveness in connection. These properties suggest the following architecture design principles of DNNs: (i) feel free to add neurons/kernels; (ii) restrain from connecting neurons. Overall, our optimistic estimate theoretically unveils the vast potential of nonlinear models in fitting at overparameterization. Based on this framework, we anticipate gaining a deeper understanding of how and why numerous nonlinear models such as DNNs can effectively realize their potential in practice in the near future.
翻译:我们提出一种乐观估计方法,用于评估非线性模型可能达到的最佳拟合性能。该方法产生一个乐观样本量,用以量化使用非线性模型拟合/恢复目标函数所需的最小可能样本量。我们估计了矩阵分解模型、深度模型以及具有全连接或卷积架构的深度神经网络(DNNs)的乐观样本量。对于每种非线性模型,我们的估计预测了在过参数化条件下可被拟合的特定目标子集,并通过实验验证了这些预测。我们的乐观估计揭示了DNN模型的两个特殊性质——宽度上的自由表达性和连接上的高成本表达性。这些性质为DNN架构设计提供了以下原则:(i)可自由添加神经元/卷积核;(ii)应避免神经元之间的过度连接。总体而言,我们的乐观估计从理论上揭示了非线性模型在过参数化拟合中的巨大潜力。基于这一框架,我们预计在不久的将来能更深入地理解众多非线性模型(如DNNs)如何在实践中有效发挥其潜力的机制与原因。