Density estimation in high-dimensional settings is an important and challenging statistical problem.Traditional methods based on kernel smoothing are inefficient in high dimensions due to the difficulties in specifying appropriate location-adaptive kernels. In this work, we introduce pre-training, a key idea behind many cutting-edge AI technologies, to the context of non-parametric density estimation. By establishing a pre-trained neural network that can recommend an appropriate location-adaptive kernel for each sample point, efficient density estimation with adaptive kernels is achieved in high dimensions. A wide range of numerical experiments show that this strategy is highly effective for improving density-estimation accuracy, when the target distribution is close to the distribution family for pre-training. When the target distribution is substantially different from the pre-training distribution family, the benefit from the proposed pre-training strategy may be diluted, but can be reactivated by an additional fine-tuning procedure.
翻译:在高维场景下的密度估计是一个重要且具有挑战性的统计问题。基于核平滑的传统方法在高维空间中效率低下,原因在于难以指定合适的局部自适应核。本文首次将预训练这一众多前沿人工智能技术的核心思想引入非参数密度估计领域。通过建立预训练神经网络为每个样本点推荐合适的局部自适应核,实现了高维空间中具有自适应核的高效密度估计。大量数值实验表明,当目标分布接近预训练所基于的分布族时,该策略能显著提高密度估计精度。当目标分布与预训练分布族存在显著差异时,预训练策略带来的优势可能减弱,但可通过额外的微调步骤重新激活。