Statistical inference is often simplified by sample-splitting. This simplification comes at the cost of the introduction of randomness that is not native to the data. We propose a simple procedure for sequentially aggregating statistics constructed with multiple splits of the same sample. The user specifies a bound and a nominal error rate. If the procedure is implemented twice on the same data, the nominal error rate approximates the chance that the results differ by more than the bound. We provide a non-asymptotic analysis of the accuracy of the nominal error rate and illustrate the application of the procedure to several widely applied statistical methods.
翻译:统计推断常因样本分割而简化,但这一简化过程会引入数据本身不具有的随机性。我们提出了一种简单方法,用于序贯聚合同一样本多次分割后构建的统计量。用户需指定一个边界和名义错误率。若对同一数据重复实施该过程两次,名义错误率近似表征结果差异超过该边界的概率。我们提供了名义错误率准确性的非渐近分析,并展示了该方法在多种广泛应用统计方法中的应用。