We revisit the classical problem of estimating an unknown distribution from its samples by fitting a mixture model that minimizes cross-entropy loss. Framing the task as a stochastic convex optimization problem over the space of $ M $-component mixture distributions, we propose a family of estimators derived from the stochastic mirror descent (SMD) algorithm. This optimization-based approach provides a principled and flexible framework that generalizes traditional estimators and proposes a variety of novel estimators through the choice of Bregman divergences. A key advantage of our method is that it scales efficiently with the number of candidate components $ f_i $; that is, one can employ a large set of basis distributions in the mixture model without incurring significant computational overhead. This enables richer approximations and improved estimation accuracy. Moreover, in the case of categorical distribution (discrete outcomes) our estimators do not require a strict lower bound, in other words our framework does not require the precise knowledge of the support of the distribution. We demonstrate that, under mild conditions, the proposed $ \varphi $-SMD estimators achieve near-optimal convergence rates in both Kullback-Leibler (KL) divergence and $ \ell_2 $-norm and offer practical benefits when computation is expensive. Our numerical analysis highlights improved performance guaranties over classical estimators, particularly in terms of sample efficiency and scalability.
翻译:我们重新审视了通过拟合最小化交叉熵损失的混合模型来从样本中估计未知分布的经典问题。将该任务建模为 $ M $ 组件混合分布空间上的随机凸优化问题,我们提出了一族基于随机镜像梯度(SMD)算法的估计器。这种基于优化的方法提供了一个原则性且灵活的框架,它推广了传统估计器,并通过选择不同的Bregman散度引入了多种新颖估计器。我们方法的关键优势在于其能随候选组件 $ f_i $ 的数量高效扩展;即,可以在混合模型中采用大规模的基分布集合,而不会带来显著的计算开销。这实现了更丰富的近似和更高的估计精度。此外,对于类别分布(离散结果)的情况,我们的估计器无需严格的下界,换言之,我们的框架不需要精确知道分布的支持集。我们证明,在温和条件下,所提出的 $ \varphi $-SMD估计器在Kullback-Leibler(KL)散度和 $ \ell_2 $ 范数下均能达到近乎最优的收敛速率,并在计算成本高昂时提供实际益处。我们的数值分析突出了相较于经典估计器在性能保证上的改进,尤其在样本效率和可扩展性方面。