We study density estimation in Kullback-Leibler divergence: given an i.i.d. sample from an unknown density $p^\star$, the goal is to construct an estimator $\widehat{p}$ such that $\mathrm{KL}(p^\star,\widehat{p})$ is small with high probability. We consider two fundamental settings involving a finite dictionary of densities: (i) model aggregation, where $p^\star$ belongs to the dictionary, and (ii) convex aggregation (mixture density estimation), where $p^\star$ is a mixture of densities from the dictionary. Crucially, we make no assumption on the base densities: their ratios may be unbounded and their supports may differ. For both problems, we identify the best possible high-probability guarantees in terms of the dictionary size, sample size, and confidence level. These optimal rates are higher than those achievable when density ratios are bounded by absolute constants; for mixture density estimation, they match existing lower bounds in the special case of discrete distributions. Our analysis of the mixture case hinges on two new covering results. First, we provide a sharp, distribution-free upper bound on the local Hellinger entropy of the class of mixtures of $M$ distributions. Second, we prove an optimal ratio covering theorem for convex sets: for every convex compact set $K \subset \mathbb{R}_+^d$, there exists a subset $A \subset K$ with at most $2^{O(d)}$ elements such that each element of $K$ is coordinate-wise dominated by an element of $A$ up to a universal constant factor. This geometric result is of independent interest; notably, it yields new cardinality estimates for $\varepsilon$-approximate Pareto sets in multi-objective optimization with convex feasible set.
翻译:我们研究Kullback-Leibler散度下的密度估计:给定来自未知密度$p^\star$的独立同分布样本,目标是构造估计量$\widehat{p}$,使得$\mathrm{KL}(p^\star,\widehat{p})$以高概率较小。我们考虑涉及有限密度字典的两个基本场景:(i) 模型聚合,其中$p^\star$属于字典;(ii) 凸聚合(混合密度估计),其中$p^\star$是字典中密度的混合。关键在于,我们对基密度不做任何假设:其比值可能无界,支撑集可能不同。针对这两个问题,我们给出了关于字典大小、样本量和置信水平的最佳可能高概率保证。这些最优速率高于密度比值受绝对常数限制时可实现的速率;对于混合密度估计,它们在离散分布的特殊情形下与现有下界匹配。我们对混合情形的分析依赖于两个新的覆盖结果。首先,我们给出了$M$个分布混合类局部Hellinger熵的尖锐、无分布上界。其次,我们证明了凸集的最优比例覆盖定理:对于每个凸紧集$K \subset \mathbb{R}_+^d,存在一个最多包含$2^{O(d)}$个元素的子集$A \subset K$,使得$K$中每个元素的坐标均被$A$中某个元素控制(至多差一个通用常数因子)。这一几何结果具有独立意义;值得注意的是,它为多目标优化中凸可行集上$\varepsilon$-近似Pareto集的基数估计提供了新结果。