Conformal prediction provides a framework for uncertainty quantification, specifically in the forms of prediction intervals and sets with distribution-free guaranteed coverage. While recent cross-conformal techniques such as CV+ and Jackknife+-after-bootstrap achieve better data efficiency than traditional split conformal methods, they incur substantial computational costs due to required pairwise comparisons between training and test samples' out-of-bag scores. Observing that these methods naturally extend from ensemble models, particularly random forests, we leverage existing optimized random forest implementations to enable efficient cross-conformal predictions. We present coverforest, a Python package that implements efficient conformal prediction methods specifically optimized for random forests. coverforest supports both regression and classification tasks through various conformal prediction methods, including split conformal, CV+, Jackknife+-after-bootstrap, and adaptive prediction sets. Our package leverages parallel computing and Cython optimizations to speed up out-of-bag calculations. Our experiments demonstrate that coverforest's predictions achieve the desired level of coverage. In addition, its training and prediction times can be faster than an existing implementation by 2--9 times. The source code for the coverforest is hosted on GitHub at https://github.com/donlap/coverforest.
翻译:保形预测为不确定性量化提供了一个框架,特别以具有分布无关保证覆盖率的预测区间和集合的形式实现。尽管近期的交叉保形技术(如CV+和Jackknife+-after-bootstrap)相比传统的分割保形方法具有更好的数据效率,但由于需要计算训练样本与测试样本袋外得分的两两比较,这些方法会产生巨大的计算开销。通过观察发现这些方法天然适用于集成模型(特别是随机森林),我们利用现有优化的随机森林实现来支持高效的交叉保形预测。本文提出coverforest——一个专门为随机森林优化的高效保形预测方法Python工具包。coverforest通过多种保形预测方法(包括分割保形、CV+、Jackknife+-after-bootstrap以及自适应预测集)同时支持回归和分类任务。本工具包利用并行计算和Cython优化加速袋外计算。实验表明coverforest的预测能达到期望的覆盖水平。此外,其训练和预测时间可比现有实现快2-9倍。coverforest的源代码已托管于GitHub:https://github.com/donlap/coverforest。