To overcome the computational bottleneck of various data perturbation procedures such as the bootstrap and cross validations, we propose the Generative Multiple-purpose Sampler (GMS), which constructs a generator function to produce solutions of weighted M-estimators from a set of given weights and tuning parameters. The GMS is implemented by a single optimization without having to repeatedly evaluate the minimizers of weighted losses, and is thus capable of significantly reducing the computational time. We demonstrate that the GMS framework enables the implementation of various statistical procedures that would be unfeasible in a conventional framework, such as the iterated bootstrap, bootstrapped cross-validation for penalized likelihood, bootstrapped empirical Bayes with nonparametric maximum likelihood, etc. To construct a computationally efficient generator function, we also propose a novel form of neural network called the \emph{weight multiplicative multilayer perceptron} to achieve fast convergence. Our numerical results demonstrate that the new neural network structure enjoys a few orders of magnitude speed advantage in comparison to the conventional one. An R package called GMS is provided, which runs under Pytorch to implement the proposed methods and allows the user to provide a customized loss function to tailor to their own models of interest.
翻译:为解决诸如自助法和交叉验证等各类数据扰动过程中的计算瓶颈问题,本文提出生成式多用途采样器(GMS),该采样器构建一个生成函数,用于从给定权重和调优参数集合中产生加权M估计量的解。GMS通过单次优化实现,无需重复评估加权损失的极小值点,因此能显著减少计算时间。我们证明,GMS框架能够实现传统框架下不可行的多种统计流程,例如迭代自助法、带惩罚似然的自助法交叉验证、基于非参数最大似然的自助法经验贝叶斯等。为构建计算高效的生成函数,我们还提出一种名为"权重乘法多层感知机"的新型神经网络形式,以实现快速收敛。数值结果表明,与传统神经网络结构相比,新网络结构在速度上具有数个数量级的优势。本文提供了名为GMS的R包,该包运行于PyTorch框架下,可实现所提出的方法,并允许用户提供自定义损失函数以适配其自身感兴趣的模型。