We study density estimation in Kullback-Leibler divergence: given an i.i.d. sample from an unknown density $p$, the goal is to construct an estimator $\widehat p$ such that $\mathrm{KL}(p,\widehat p)$ is small with high probability. We consider two settings involving a finite dictionary of $M$ densities: (i) model aggregation, where $p$ belongs to the dictionary, and (ii) convex aggregation (mixture density estimation), where $p$ is a mixture of densities from the dictionary. Crucially, we make no assumption on the base densities: their ratios may be unbounded and their supports may differ. For both problems, we identify the best possible high-probability guarantees in terms of the dictionary size, sample size, and confidence level. These optimal rates are higher than those achievable when density ratios are bounded by absolute constants; for mixture density estimation, they match existing lower bounds in the special case of discrete distributions. Our analysis of the mixture case hinges on two new covering results. First, we provide a sharp, distribution-free upper bound on the local Hellinger entropy of the class of mixtures of $M$ distributions. Second, we prove an optimal ratio covering theorem for convex sets: for every convex compact set $K\subset \mathbb{R}_+^d$, there exists a subset $A\subset K$ with at most $2^{8d}$ elements such that each element of $K$ is coordinate-wise dominated by an element of $A$ up to a universal constant factor. This geometric result is of independent interest; notably, it yields new cardinality estimates for $\varepsilon$-approximate Pareto sets in multi-objective optimization when the attainable set of objective vectors is convex.
翻译:我们研究Kullback-Leibler散度下的密度估计问题:给定来自未知密度$p$的独立同分布样本,目标是构造估计量$\widehat p$,使得$\mathrm{KL}(p,\widehat p)$以高概率较小。我们考虑涉及$M$个密度有限字典的两种设定:(i)模型聚合,其中$p$属于该字典;(ii)凸聚合(混合密度估计),其中$p$是字典中密度的混合。关键的是,我们对基密度不作任何假设:它们的比率可能无界,且支撑集可能不同。针对这两个问题,我们依据字典大小、样本量和置信水平确定了最佳可能的高概率保证界。这些最优速率高于密度比率被绝对常数界定时可达到的速率;对于混合密度估计,在离散分布的特殊情况下,这些速率与现有下界相匹配。我们对混合情形的分析依赖于两个新的覆盖结果。首先,我们给出了$M$个分布混合类局部Hellinger熵的尖锐、无分布上界。其次,我们证明了凸集的最优比率覆盖定理:对于每个紧凸集$K\subset \mathbb{R}_+^d$,存在至多包含$2^{8d}$个元素的子集$A\subset K$,使得$K$的每个元素在坐标意义下被$A$的某个元素以通用常数因子所控制。这一几何结果具有独立的理论价值;值得注意的是,当目标向量可达集为凸集时,该结果为多目标优化中$\varepsilon$-近似帕累托集的基数估计提供了新的方法。