This paper studies distribution-free inference in settings where the data set has a hierarchical structure -- for example, groups of observations, or repeated measurements. In such settings, standard notions of exchangeability may not hold. To address this challenge, a hierarchical form of exchangeability is derived, facilitating extensions of distribution-free methods, including conformal prediction and jackknife+. While the standard theoretical guarantee obtained by the conformal prediction framework is a marginal predictive coverage guarantee, in the special case of independent repeated measurements, it is possible to achieve a stronger form of coverage -- the "second-moment coverage" property -- to provide better control of conditional miscoverage rates, and distribution-free prediction sets that achieve this property are constructed. Simulations illustrate that this guarantee indeed leads to uniformly small conditional miscoverage rates. Empirically, this stronger guarantee comes at the cost of a larger width of the prediction set in scenarios where the fitted model is poorly calibrated, but this cost is very mild in cases where the fitted model is accurate.
翻译:本文研究数据集具有分层结构(例如观测组别或重复测量)时的分布自由推断问题。在此类场景下,标准的可交换性概念可能不再适用。为应对这一挑战,本文推导出一种分层形式的可交换性,从而扩展了分布自由方法(包括保形预测和jackknife+)的适用范围。虽然保形预测框架的标准理论保证是边际预测覆盖保证,但在独立重复测量的特殊情形下,可以实现更强形式的覆盖性质——"二阶矩覆盖"特性——以更好地控制条件误覆盖率的分布,并构造出满足该性质的分布自由预测集。模拟实验表明,该保证确实能实现均匀较小的条件误覆盖率。实证结果显示,在拟合模型校准不佳的场景中,这种更强的保证会以预测集宽度增大为代价,但当拟合模型准确时,该代价非常轻微。