Deep generative models (DGMs) are data-eager because learning a complex model on limited data suffers from a large variance and easily overfits. Inspired by the classical perspective of the bias-variance tradeoff, we propose regularized deep generative model (Reg-DGM), which leverages a nontransferable pre-trained model to reduce the variance of generative modeling with limited data. Formally, Reg-DGM optimizes a weighted sum of a certain divergence and the expectation of an energy function, where the divergence is between the data and the model distributions, and the energy function is defined by the pre-trained model w.r.t. the model distribution. We analyze a simple yet representative Gaussian-fitting case to demonstrate how the weighting hyperparameter trades off the bias and the variance. Theoretically, we characterize the existence and the uniqueness of the global minimum of Reg-DGM in a non-parametric setting and prove its convergence with neural networks trained by gradient-based methods. Empirically, with various pre-trained feature extractors and a data-dependent energy function, Reg-DGM consistently improves the generation performance of strong DGMs with limited data and achieves competitive results to the state-of-the-art methods. Our implementation is available at https://github.com/ML-GSAI/Reg-ADA-APA.
翻译:摘要:深度生成模型对数据需求量较大,因为在有限数据上学习复杂模型会导致较大方差并容易过拟合。受经典偏差-方差权衡视角的启发,我们提出正则化深度生成模型(Reg-DGM),该模型利用不可迁移的预训练模型来降低有限数据生成建模中的方差。形式上,Reg-DGM优化某个散度与能量函数期望值的加权和,其中散度衡量数据分布与模型分布之间的差异,而能量函数由预训练模型针对模型分布定义。我们通过分析一个简单且具有代表性的高斯拟合案例,展示了权重超参数如何权衡偏差与方差。在理论上,我们刻画了非参数设定下Reg-DGM全局最小值的存在性与唯一性,并证明了其在使用基于梯度方法训练的神经网络时的收敛性。在实验上,通过采用多种预训练特征提取器及数据依赖的能量函数,Reg-DGM在有限数据下持续提升了强基生成模型的生成性能,并取得了与最先进方法相媲美的结果。我们的实现代码见https://github.com/ML-GSAI/Reg-ADA-APA。