The estimation of cumulative distribution functions (CDF) and probability density functions (PDF) is a fundamental practice in applied statistics. However, challenges often arise when dealing with data arranged in grouped intervals. In this paper, we discuss a suitable and highly flexible non-parametric density estimation approach for binned distributions, based on cubic monotonicity-preserving splines - known as cubic spline interpolation. Results from simulation studies demonstrate that this approach outperforms many widely used heuristic methods. Additionally, the application of this method to a dataset of train delays in Germany and micro census data on distance and travel time to work yields both meaningful but also some questionable results.
翻译:累积分布函数(CDF)与概率密度函数(PDF)的估计是应用统计学中的基础性实践。然而,当数据以分组区间形式呈现时,常会面临诸多挑战。本文探讨了一种适用于分箱分布、基于三次保单调样条(即三次样条插值)的高度灵活的非参数密度估计方法。仿真研究表明,该方法优于许多广泛使用的启发式方法。此外,将该方法应用于德国火车延误数据集以及关于通勤距离与通勤时间的微观普查数据时,既产生了有意义的结果,也暴露出一些值得商榷的结论。