The Beta kernel estimator offers a theoretically superior alternative to the Gaussian kernel for unit interval data, eliminating boundary bias without requiring reflection or transformation. However, its adoption remains limited by the lack of a reliable bandwidth selector; practitioners currently rely on iterative optimization methods that are computationally expensive and prone to instability. We derive the ``\rot,'' a fast, closed-form bandwidth selector based on the unweighted Asymptotic Mean Integrated Squared Error (AMISE) of a beta reference distribution. To address boundary integrability issues, we introduce a principled heuristic for U-shaped and J-shaped distributions. By employing a method-of-moments approximation, we reduce the bandwidth selection complexity from iterative optimization to $\mathcal{O}(1)$. Extensive Monte Carlo simulations demonstrate that our rule matches the accuracy of numerical optimization while delivering a speedup of over 35,000 times. Real-world validation on socioeconomic data shows that it avoids the ``vanishing boundary'' and ``shoulder'' artifacts common to Gaussian-based methods. We provide a comprehensive, open-source Python package to facilitate the immediate adoption of the Beta kernel as a drop-in replacement for standard density estimation tools.
翻译:Beta核估计器为区间数据提供了一种理论上优于高斯核的替代方案,无需通过反射或变换即可消除边界偏差。然而,由于缺乏可靠的带宽选择器,其应用仍受到限制;目前实践者依赖计算成本高昂且易不稳定的迭代优化方法。我们推导出“\rot”,这是一种基于Beta参考分布的无加权渐近均方积分误差(AMISE)的快速闭式带宽选择器。为解决边界可积性问题,我们针对U形和J形分布提出了一种基于原理的启发式方法。通过采用矩估计近似,我们将带宽选择的复杂度从迭代优化降低至$\mathcal{O}(1)$。大量蒙特卡洛模拟表明,该规则在保持数值优化精度的同时,实现了超过35,000倍的加速。在社会经济数据上的实际验证显示,该方法避免了基于高斯核的方法常见的“边界消失”和“肩部”伪影。我们提供了一个全面的开源Python软件包,以促进Beta核作为标准密度估计工具的即插即用替代方案。