We address the challenge of constructing valid confidence intervals and sets in problems of prediction across multiple environments. We investigate two types of coverage suitable for these problems, extending the jackknife and split-conformal methods to show how to obtain distribution-free coverage in such non-traditional, hierarchical data-generating scenarios. Our contributions also include extensions for settings with non-real-valued responses and a theory of consistency for predictive inference in these general problems. We demonstrate a novel resizing method to adapt to problem difficulty, which applies both to existing approaches for predictive inference with hierarchical data and the methods we develop; this reduces prediction set sizes using limited information from the test environment, a key to the methods' practical performance, which we evaluate through neurochemical sensing and species classification datasets.
翻译:我们针对跨多环境预测问题中构建有效置信区间和置信集的挑战展开研究。针对此类问题,我们研究了两种合适的覆盖类型,通过扩展刀切法和分裂共形法,展示了如何在非传统分层数据生成场景中获得无分布假设的覆盖。我们的贡献还包括针对非实值响应场景的扩展,以及在这些通用问题中预测推理的一致性理论。我们提出了一种新颖的尺寸调整方法以适应问题难度,该方法既适用于现有分层数据预测推理方法,也适用于我们开发的方法;该方法通过利用测试环境的有限信息减少预测集尺寸,这是决定方法实际性能的关键——我们通过神经化学传感和物种分类数据集对其进行了评估。