In this work, we discuss a general class of the estimators for the cumulative distribution function (CDF) based on judgment post stratification (JPS) sampling scheme which includes both empirical and kernel distribution functions. Specifically, we obtain the expectation of the estimators in this class and show that they are asymptotically more efficient than their competitors in simple random sampling (SRS), as long as the rankings are better than random guessing. We find a mild condition that is necessary and sufficient for them to be asymptotically unbiased. We also prove that given the same condition, the estimators in this class are strongly uniformly consistent estimators of the true CDF, and converge in distribution to a normal distribution when the sample size goes to infinity. We then focus on the kernel distribution function (KDF) in the JPS design and obtain the optimal bandwidth. We next carry out a comprehensive Monte Carlo simulation to compare the performance of the KDF in the JPS design for different choices of sample size, set size, ranking quality, parent distribution, kernel function as well as both perfect and imperfect rankings set-ups with its counterpart in SRS design. It is found that the JPS estimator dramatically improves the efficiency of the KDF as compared to its SRS competitor for a wide range of the settings. Finally, we apply the described procedure on a real dataset from medical context to show their usefulness and applicability in practice.
翻译:本文讨论了一类基于判断后分层抽样方案的累积分布函数估计量,该估计量包含经验分布函数与核分布函数。具体而言,我们推导了此类估计量的期望,并证明只要排序优于随机猜测,其渐近效率将优于简单随机抽样中的对应估计量。我们找到了使其渐近无偏的充分必要条件,并证明在该条件下,该类估计量是真实累积分布函数的强一致估计量,且当样本量趋于无穷时依分布收敛于正态分布。随后,我们聚焦于判断后分层设计中的核分布函数,并推导了最优带宽。接着通过全面蒙特卡洛模拟,比较了判断后分层设计中不同样本量、集合规模、排序质量、总体分布、核函数以及完全/不完全排序设置下核分布函数的性能表现,并与简单随机抽样设计中的对应方法进行了对比。结果表明,在广泛参数设定下,判断后分层估计量相比简单随机抽样竞争方法显著提升了核分布函数的效率。最后,我们将所述方法应用于医学领域的真实数据集,验证了其实用性与可操作性。