The Shannon entropy is a fundamental measure for quantifying diversity and model complexity in fields such as information theory, ecology, and genetics. However, many existing studies assume that the number of species is known, an assumption that is often unrealistic in practice. In recent years, efforts have been made to relax this restriction. Motivated by these developments, this study proposes an entropy estimation method based on the Pitman--Yor process, a representative approach in Bayesian nonparametrics. By approximating the true distribution as an infinite-dimensional process, the proposed method enables stable estimation even when the number of observed species is smaller than the true number of species. This approach provides a principled way to deal with the uncertainty in species diversity and enhances the reliability and robustness of entropy-based diversity assessment. In addition, we investigate the convergence property of the Shannon entropy for regularly varying distributions and use this result to establish the consistency of the proposed estimator. Finally, we demonstrate the effectiveness of the proposed method through numerical experiments.
翻译:香农熵是信息论、生态学和遗传学等领域中用于量化多样性和模型复杂性的基本度量。然而,许多现有研究假设物种数量已知,这一假设在实践中往往不切实际。近年来,已有研究致力于放宽这一限制。受这些进展的启发,本研究提出了一种基于Pitman--Yor过程的熵估计方法,该方法是贝叶斯非参数统计中的代表性方法。通过将真实分布近似为一个无限维过程,所提方法即使在观测到的物种数量小于真实物种数量的情况下,也能实现稳定的估计。该方法为处理物种多样性的不确定性提供了一种原则性的途径,并增强了基于熵的多样性评估的可靠性与鲁棒性。此外,我们研究了正则变化分布的香农熵的收敛性质,并利用这一结果证明了所提估计量的一致性。最后,我们通过数值实验验证了所提方法的有效性。