We propose a method for non-parametric conditional distribution estimation based on partitioning covariate-sorted observations into contiguous bins and using the within-bin empirical CDF as the predictive distribution. Bin boundaries are chosen to minimise the total leave-one-out Continuous Ranked Probability Score (LOO-CRPS), which admits a closed-form cost function with $O(n^2 \log n)$ precomputation and $O(n^2)$ storage; the globally optimal $K$-partition is recovered by a dynamic programme in $O(n^2 K)$ time. Minimisation of within-sample LOO-CRPS turns out to be inappropriate for selecting $K$ as it results in in-sample optimism. We instead select $K$ by $K$-fold cross-validation of test CRPS, which yields a U-shaped criterion with a well-defined minimum. Having selected $K^*$ and fitted the full-data partition, we form two complementary predictive objects: the Venn prediction band and a conformal prediction set based on CRPS as the nonconformity score, which carries a finite-sample marginal coverage guarantee at any prescribed level $\varepsilon$. The conformal prediction is transductive and data-efficient, as all observations are used for both partitioning and p-value calculation, with no need to reserve a hold-out set. On real benchmarks against split-conformal competitors (Gaussian split conformal, CQR, CQR-QRF, and conformalized isotonic distributional regression), the method produces substantially narrower prediction intervals while maintaining near-nominal coverage.
翻译:我们提出一种基于协变量排序观测的分区非参数条件分布估计方法,通过将排序数据划分为连续分箱并利用箱内经验累积分布函数作为预测分布。分箱边界选择旨在最小化总留一连续排序概率得分(LOO-CRPS),该指标具有闭式代价函数,其预计算复杂度为$O(n^2 \log n)$,存储复杂度为$O(n^2)$;通过动态规划算法可在$O(n^2 K)$时间内恢复全局最优$K$分区。研究发现,直接最小化样本内LOO-CRPS不适用于选择$K$,因其会导致样本内乐观偏差。我们转而采用$K$折交叉验证测试CRPS来选取$K$,该准则呈U型特征且具有明确最小值。确定最优分箱数$K^*$并拟合全数据分区后,可构建两种互补预测对象:维恩预测带和基于CRPS非一致性分数的保形预测集,后者能在任意预设水平$\varepsilon$下提供有限样本边际覆盖保证。该保形预测采用转导式且数据高效策略,所有观测值既用于分区也用于p值计算,无需预留保留集。在真实基准测试中,与分裂保形竞争方法(高斯分裂保形、CQR、CQR-QRF及保形等渗分布回归)相比,本方法在保持近名义覆盖率的条件下生成明显更窄的预测区间。