We consider the question of estimating multi-dimensional Gaussian mixtures (GM) with compactly supported or subgaussian mixing distributions. Minimax estimation rate for this class (under Hellinger, TV and KL divergences) is a long-standing open question, even in one dimension. In this paper we characterize this rate (for all constant dimensions) in terms of the metric entropy of the class. Such characterizations originate from seminal works of Le Cam (1973); Birge (1983); Haussler and Opper (1997); Yang and Barron (1999). However, for GMs a key ingredient missing from earlier work (and widely sought-after) is a comparison result showing that the KL and the squared Hellinger distance are within a constant multiple of each other uniformly over the class. Our main technical contribution is in showing this fact, from which we derive entropy characterization for estimation rate under Hellinger and KL. Interestingly, the sequential (online learning) estimation rate is characterized by the global entropy, while the single-step (batch) rate corresponds to local entropy, paralleling a similar result for the Gaussian sequence model recently discovered by Neykov (2022) and Mourtada (2023). Additionally, since Hellinger is a proper metric, our comparison shows that GMs under KL satisfy the triangle inequality within multiplicative constants, implying that proper and improper estimation rates coincide.
翻译:我们考虑估计具有紧支撑或次高斯混合分布的多维高斯混合模型(GM)问题。该类模型在赫林格散度、全变差距离和KL散度下的极小极大估计速率(即使在一维情形下)是一个长期未解决的公开问题。本文以该类的度量熵为工具刻画了所有常数维数下的这一速率。此类刻画源于Le Cam(1973)、Birge(1983)、Haussler与Opper(1997)以及Yang与Barron(1999)的开创性工作。然而,对于高斯混合模型,早期研究缺失的一个关键要素(且被广泛寻求)是比较结果:即KL散度与平方赫林格距离在该类上以常数倍数一致地相互控制。我们的主要技术贡献在于证明这一事实,并由此推导出赫林格和KL散度下估计速率的熵特征。有趣的是,序贯(在线学习)估计速率由全局熵刻画,而单步(批处理)速率对应于局部熵,这与Neykov(2022)和Mourtada(2023)最近发现的高斯序列模型的相似结果相呼应。此外,由于赫林格距离是真正的度量,我们的比较表明在乘法常数意义下KL散度下的高斯混合模型满足三角不等式,从而证明适当估计与不适当估计速率一致。