The Beta kernel estimator offers a theoretically superior alternative to the Gaussian kernel for unit interval data, eliminating boundary bias without requiring reflection or transformation. However, its adoption remains limited by the lack of a reliable bandwidth selector; practitioners currently rely on iterative optimization methods that are computationally expensive and prone to instability. We derive the ``Beta Reference Rule,'' a fast, closed-form bandwidth selector based on the unweighted Asymptotic Mean Integrated Squared Error (AMISE) of a beta reference distribution. To address boundary integrability issues, we introduce a principled heuristic for U-shaped and J-shaped distributions. By employing a method-of-moments approximation, we reduce the bandwidth selection complexity from iterative optimization to $\mathcal{O}(1)$. Extensive Monte Carlo simulations demonstrate that our rule matches the accuracy of numerical optimization while delivering a speedup of over 35,000 times. Real-world validation on socioeconomic data shows that it avoids the ``vanishing boundary'' and ``shoulder'' artifacts common to Gaussian-based methods. We provide a comprehensive, open-source Python package to facilitate the immediate adoption of the Beta kernel as a drop-in replacement for standard density estimation tools.
翻译:Beta核估计器为区间[0,1]上的数据提供了理论上优于高斯核的替代方案,可在无需反射或变换的条件下消除边界偏差。然而,由于缺乏可靠的带宽选择方法,其应用仍受限制——实际应用中通常依赖计算成本高且易不稳定的迭代优化方法。我们推导出"Beta参考准则",这是一种基于Beta参考分布的非加权渐近均方积分误差(AMISE)的快速闭式带宽选择方法。针对边界可积性问题,我们为U型和J型分布引入了基于原则的启发式策略。通过采用矩估计近似,我们将带宽选择的复杂度从迭代优化降至$\mathcal{O}(1)$。大规模蒙特卡洛模拟表明,该准则在保持与数值优化同等精度的同时,实现了超过35,000倍的加速比。基于社会经济数据的真实世界验证显示,该方法可避免高斯方法常见的"边界消失"和"肩部"伪影。我们提供了全面的开源Python包,以促进Beta核作为标准密度估计工具的即插即用替代方案。