We study the problem of estimating a distribution over a finite alphabet from an i.i.d. sample, with accuracy measured in relative entropy (Kullback-Leibler divergence). While optimal bounds on the expected risk are known, high-probability guarantees remain less well-understood. First, we analyze the classical Laplace (add-one) estimator, obtaining matching upper and lower bounds on its performance and establishing its optimality among confidence-independent estimators. We then characterize the minimax-optimal high-probability risk and show that it is achieved by a simple confidence-dependent smoothing technique. Notably, the optimal non-asymptotic risk incurs an additional logarithmic factor compared to the ideal asymptotic rate. Next, motivated by regimes in which the alphabet size exceeds the sample size, we investigate methods that adapt to the sparsity of the underlying distribution. We introduce an estimator using data-dependent smoothing, for which we establish a high-probability risk bound depending on two effective sparsity parameters. As part of our analysis, we also derive a sharp high-probability upper bound on the missing mass.
翻译:我们研究从独立同分布样本中估计有限字母表上分布的问题,其精度以相对熵(Kullback-Leibler散度)度量。虽然期望风险的最优界已知,但高概率保证仍不甚明晰。首先,我们分析经典的拉普拉斯(加一)估计器,获得了其性能的匹配上下界,并确立了其在置信度无关估计器中的最优性。随后,我们刻画了极小极大最优的高概率风险,并证明其可通过一种简单的置信度相关平滑技术实现。值得注意的是,与理想的渐近速率相比,最优非渐近风险会产生一个额外的对数因子。接着,受字母表大小超过样本量的机制启发,我们研究了适应于底层分布稀疏性的方法。我们引入一种使用数据相关平滑的估计器,并为其建立了一个依赖于两个有效稀疏性参数的高概率风险界。作为分析的一部分,我们还推导了缺失质量的一个尖锐高概率上界。