We propose a method for non-parametric conditional distribution estimation based on partitioning covariate-sorted observations into contiguous bins and using the within-bin empirical CDF as the predictive distribution. Bin boundaries are chosen to minimise the total leave-one-out Continuous Ranked Probability Score (LOO-CRPS), which admits a closed-form cost function with $O(n^2 \log n)$ precomputation and $O(n^2)$ storage; the globally optimal $K$-partition is recovered by a dynamic programme in $O(n^2 K)$ time. Minimisation of Within-sample LOO-CRPS turns out to be inappropriate for selecting $K$ as it results in in-sample optimism. So we instead select $K$ by evaluating test CRPS on an alternating held-out split, which yields a U-shaped criterion with a well-defined minimum. Having selected $K^*$ and fitted the full-data partition, we form two complementary predictive objects: the Venn prediction band and a conformal prediction set based on CRPS as the nonconformity score, which carries a finite-sample marginal coverage guarantee at any prescribed level $\varepsilon$. On real benchmarks against split-conformal competitors (Gaussian split conformal, CQR, and CQR-QRF), the method produces substantially narrower prediction intervals while maintaining near-nominal coverage.
翻译:我们提出了一种基于协变量排序观测值的非参数条件分布估计方法,通过将数据划分为连续分箱,并使用箱内经验累积分布函数作为预测分布。分箱边界通过最小化总留一连续排序概率得分(LOO-CRPS)确定,该得分具有闭式成本函数,预计算复杂度为$O(n^2 \log n)$,存储复杂度为$O(n^2)$;全局最优$K$分箱通过动态规划以$O(n^2 K)$时间恢复。最小化样本内LOO-CRPS不适用于选择$K$,因其会导致样本内乐观偏差。因此,我们通过在交替保留数据集上评估测试CRPS来选择$K$,该准则呈现具有明确定义最小值的U形曲线。选定$K^*$并拟合全数据分箱后,我们构建两个互补预测对象:维恩预测带和基于CRPS作为非一致性得分的保形预测集,后者在任意指定水平$\varepsilon$下具有有限样本边际覆盖保证。在真实基准测试中,与分裂保形方法(高斯分裂保形、CQR和CQR-QRF)相比,本方法在保持接近标称覆盖的同时,产生了显著更窄的预测区间。