Entropy estimation plays a crucial role in various fields, such as information theory, statistical data science, and machine learning. However, traditional entropy estimation methods often struggle with complex data distributions. Mixture-based estimation of entropy has been recently proposed and gained attention due to its ease of use and accuracy. This paper presents a novel approach to quantify the uncertainty associated with this mixture-based entropy estimation method using weighted likelihood bootstrap. Unlike standard methods, our approach leverages the underlying mixture structure by assigning random weights to observations in a weighted likelihood bootstrap procedure, leading to more accurate uncertainty estimation. The generation of weights is also investigated, leading to the proposal of using weights obtained from a Dirichlet distribution with parameter $\alpha = 0.8137$ instead of the usual $\alpha = 1$. Furthermore, the use of centered percentile intervals emerges as the preferred choice to ensure empirical coverage close to the nominal level. Extensive simulation studies comparing different resampling strategies are presented and results discussed. The proposed approach is illustrated by analyzing the log-returns of daily Gold prices at COMEX for the years 2014--2022, and the Net Rating scores, an advanced statistic used in basketball analytics, for NBA teams with reference to the 2022/23 regular season.
翻译:熵估计在信息论、统计数据科学和机器学习等多个领域中发挥着至关重要的作用。然而,传统的熵估计方法在处理复杂数据分布时常常面临困难。基于混合模型的熵估计方法因其易用性和准确性,近年来被提出并受到关注。本文提出了一种新颖的方法,利用加权似然自助法来量化这种基于混合模型的熵估计方法所伴随的不确定性。与标准方法不同,我们的方法通过加权似然自助法为观测值分配随机权重,从而利用底层的混合结构,实现更准确的不确定性估计。本文还研究了权重的生成方式,提出使用参数 $\alpha = 0.8137$ 的狄利克雷分布来生成权重,而非通常使用的 $\alpha = 1$。此外,研究结果表明,使用中心百分位数区间是确保经验覆盖率接近名义水平的最佳选择。本文通过广泛的模拟研究,比较了不同的重采样策略,并对结果进行了讨论。所提出的方法通过分析2014年至2022年COMEX黄金日收益率的对数,以及参考2022/23常规赛的NBA球队净效率值(一种用于篮球分析的高级统计数据)进行了实例说明。