In this research work, a total of 45 different estimators of the Shannon differential entropy were reviewed. The estimators were mainly based on three classes, namely: window size spacings, kernel density estimation (KDE) and k-nearest neighbour (kNN) estimation. A total of 16, 5 and 6 estimators were selected from each of the classes, respectively, for comparison. The performances of the 27 selected estimators, in terms of their bias values and root mean squared errors (RMSEs) as well as their asymptotic behaviours, were compared through extensive Monte Carlo simulations. The empirical comparisons were carried out at different sample sizes of 10, 50, and 100 and different variable dimensions of 1, 2, 3, and 5, for three groups of continuous distributions according to their symmetry and support. The results showed that the spacings based estimators generally performed better than the estimators from the other two classes at univariate level, but suffered from non existence at multivariate level. The kNN based estimators were generally inferior to the estimators from the other two classes considered but showed an advantage of existence for all dimensions. Also, a new class of optimal window size was obtained and sets of estimators were recommended for different groups of distributions at different variable dimensions. Finally, the asymptotic biases, variances and distributions of the 'best estimators' were considered.
翻译:在本研究工作中,总共综述了45种不同的香农微分熵估计器。这些估计器主要基于三类方法:窗宽间距法、核密度估计(KDE)以及k最近邻(kNN)估计。我们分别从这三类方法中选取了16种、5种和6种估计器进行比较。通过广泛的蒙特卡洛模拟,比较了这27种选定估计器的性能,包括其偏差值、均方根误差(RMSE)以及渐近行为。实证比较针对根据对称性和支撑集划分的三组连续分布,在样本量为10、50和100,变量维度为1、2、3和5的不同条件下进行。结果表明,在单变量水平上,基于间距的估计器通常优于其他两类估计器,但在多变量水平上存在不存在性的问题。基于kNN的估计器通常劣于所考虑的其他两类估计器,但具有在所有维度下均存在的优势。此外,研究还获得了一类新的最优窗宽,并针对不同变量维度下的不同分布组推荐了相应的估计器集合。最后,本文还探讨了“最佳估计器”的渐近偏差、方差及其分布特性。