Inductive conformal predictors (ICPs) are algorithms that are able to generate prediction sets, instead of point predictions, which are valid at a user-defined confidence level, only assuming exchangeability. These algorithms are useful for reliable machine learning and are increasing in popularity. The ICP development process involves dividing development data into three parts: training, calibration and test. With access to limited or expensive development data, it is an open question regarding the most efficient way to divide the data. This study provides several experiments to explore this question and consider the case for allowing overlap of examples between training and calibration sets. Conclusions are drawn that will be of value to academics and practitioners planning to use ICPs.
翻译:归纳共形预测器(ICPs)是一种能够生成预测集合而非点预测的算法,其预测集合在用户定义的置信水平下具有有效性,仅需满足可交换性假设。这类算法对于实现可靠机器学习具有重要意义,且正日益受到关注。ICP的开发流程需要将开发数据划分为三个部分:训练集、校准集和测试集。当开发数据有限或获取成本高昂时,如何最高效地划分数据仍是一个开放性问题。本研究通过多组实验探讨该问题,并考察允许训练集与校准集存在样本重叠的情形。所得结论将为计划使用ICP的研究者与实践者提供有价值的参考。