The Beta kernel estimator offers a theoretically superior alternative to the Gaussian kernel for unit interval data, eliminating boundary bias without requiring reflection or transformation. However, its adoption remains limited by the lack of a reliable bandwidth selector; practitioners currently rely on iterative optimization methods that are computationally expensive and prone to instability. We derive the ``Beta Reference Rule,'' a fast, closed-form bandwidth selector based on the unweighted Asymptotic Mean Integrated Squared Error (AMISE) of a beta reference distribution. To address boundary integrability issues, we introduce a principled heuristic for U-shaped and J-shaped distributions. By employing a method-of-moments approximation, we reduce the bandwidth selection complexity from iterative optimization to $\mathcal{O}(1)$. Extensive Monte Carlo simulations demonstrate that our rule matches the accuracy of numerical optimization while delivering a speedup of over 35,000 times. Real-world validation on socioeconomic data shows that it avoids the ``vanishing boundary'' and ``shoulder'' artifacts common to Gaussian-based methods. We provide a comprehensive, open-source Python package to facilitate the immediate adoption of the Beta kernel as a drop-in replacement for standard density estimation tools.
翻译:Beta核估计器为区间[0,1]数据提供了理论上优于高斯核的替代方案,可消除边界偏差而无需反射或变换处理。然而,由于缺乏可靠的带宽选择器,其应用仍受到限制;目前实践者依赖的迭代优化方法计算成本高昂且易出现不稳定性。我们推导出"Beta参考规则"——一种基于Beta参考分布未加权渐近均方积分误差(AMISE)的快速闭式带宽选择器。针对边界可积性问题,我们为U型分布和J型分布引入了基于原理的启发式方法。通过采用矩估计近似,我们将带宽选择复杂度从迭代优化降低至$\mathcal{O}(1)$。大规模蒙特卡洛模拟表明,该规则在保持与数值优化相当精度的同时,实现了超过35,000倍的加速。在真实社会经济数据上的验证显示,该方法可避免高斯方法常见的"消失边界"和"肩部"伪影。我们提供了全面的开源Python软件包,以促进Beta核作为标准密度估计工具的即插即用替代方案。