We consider the task of estimating a conditional density using i.i.d. samples from a joint distribution, which is a fundamental problem with applications in both classification and uncertainty quantification for regression. For joint density estimation, minimax rates have been characterized for general density classes in terms of uniform (metric) entropy, a well-studied notion of statistical capacity. When applying these results to conditional density estimation, the use of uniform entropy -- which is infinite when the covariate space is unbounded and suffers from the curse of dimensionality -- can lead to suboptimal rates. Consequently, minimax rates for conditional density estimation cannot be characterized using these classical results. We resolve this problem for well-specified models, obtaining matching (within logarithmic factors) upper and lower bounds on the minimax Kullback--Leibler risk in terms of the empirical Hellinger entropy for the conditional density class. The use of empirical entropy allows us to appeal to concentration arguments based on local Rademacher complexity, which -- in contrast to uniform entropy -- leads to matching rates for large, potentially nonparametric classes and captures the correct dependence on the complexity of the covariate space. Our results require only that the conditional densities are bounded above, and do not require that they are bounded below or otherwise satisfy any tail conditions.
翻译:我们考虑利用联合分布中的独立同分布样本估计条件密度的任务,这是分类和回归不确定性量化应用中的基础问题。对于联合密度估计,已通过均匀(度量)熵(一种研究充分的统计容量概念)表征了一般密度类的极小化极大速率。当将这些结果应用于条件密度估计时,使用均匀熵(当协变量空间无界时其值为无穷大,且存在维度灾难问题)可能导致次优速率。因此,条件密度估计的极小化极大速率无法通过经典结果表征。我们针对正确设定的模型解决了这一问题,通过条件密度类的经验Hellinger熵获得了极小化极大Kullback-Leibler风险的匹配(在对数因子精度内)上下界。经验熵的应用使我们能够基于局部Rademacher复杂度使用集中性论证——与均匀熵相比,这为大型(可能为非参数)类获得了匹配速率,并正确捕捉了对协变量空间复杂度的依赖。我们的结果仅要求条件密度有上界,而不需要其有下界或满足其他尾部条件。