This paper introduces a novel theoretical learning framework, termed probability distribution learning (PD learning). Departing from the traditional statistical learning framework, PD learning focuses on learning the underlying probability distribution, which is modeled as a random variable within the probability simplex. In this framework, the optimization objective is the learning error, which quantifies the posterior expected discrepancy between the model's predicted distribution and the underlying true distribution, given available sample data and prior knowledge. To optimize the learning error, this paper proposes the necessary conditions for loss functions, models, and optimization algorithms, ensuring that these conditions are met in real-world machine learning scenarios. Based on these conditions, the non-convex optimization mechanism corresponding to model training can be theoretically resolved. Moreover, this paper provides model-dependent and model-independent bounds on learning error, offering new insights into the model's fitting and generalization capabilities. Furthermore, the paper applies the PD learning framework to elucidate the mechanisms by which various techniques, including random parameter initialization, over-parameterization, and dropout, influence deep model training. Finally, the paper substantiates the key conclusions of the proposed framework through experimental results.
翻译:本文提出了一种新颖的理论学习框架,称为概率分布学习(PD学习)。与传统统计学习框架不同,PD学习的核心在于学习潜在的概率分布,该分布被建模为概率单纯形中的一个随机变量。在此框架下,优化目标是学习误差,该误差量化了在给定可用样本数据和先验知识的情况下,模型预测分布与潜在真实分布之间的后验期望差异。为优化学习误差,本文提出了损失函数、模型及优化算法所需满足的必要条件,确保这些条件在实际机器学习场景中得以实现。基于这些条件,模型训练对应的非凸优化机制可从理论上得到解析。此外,本文提供了学习误差的模型相关与模型无关上界,为模型的拟合与泛化能力提供了新的理论视角。进一步地,本文应用PD学习框架阐明了随机参数初始化、过参数化及dropout等多种技术影响深度模型训练的内在机制。最后,本文通过实验结果验证了所提框架的关键结论。