We study the problem of offline changepoint localization in a distribution-free setting. One observes a vector of data with a single changepoint, assuming that the data before and after the changepoint are iid (or more generally exchangeable) from arbitrary and unknown distributions. The goal is to produce a finite-sample confidence set for the index at which the change occurs without making any other assumptions. Existing methods often rely on parametric assumptions, tail conditions, or asymptotic approximations, or only produce point estimates. In contrast, our distribution-free algorithm, CONformal CHangepoint localization (CONCH), only leverages exchangeability arguments to construct confidence sets with finite sample coverage. By proving a conformal Neyman-Pearson lemma, we derive principled score functions that yield informative (small) sets. Moreover, with such score functions, the normalized length of the confidence set shrinks to zero under weak assumptions. We also establish a universality result showing that any distribution-free changepoint localization method must be an instance of CONCH. Experiments suggest that CONCH delivers precise confidence sets even in challenging settings involving images or text.
翻译:本研究探讨了在无分布设定下的离线变化点定位问题。假设观测数据向量中存在单一变化点,且变化点前后的数据分别来自任意未知分布的独立同分布(或更一般地,可交换)样本。目标是在不作任何其他假设的前提下,为变化点发生位置构建具有有限样本覆盖率的置信集。现有方法通常依赖于参数假设、尾部条件或渐近近似,或仅能生成点估计。与此相反,我们提出的无分布算法——保形变化点定位(CONCH)——仅利用可交换性论证来构建具有有限样本覆盖率的置信集。通过证明保形奈曼-皮尔逊引理,我们推导出能产生信息量充分(较小)置信集的原则性评分函数。此外,采用此类评分函数时,置信集的归一化长度在弱假设条件下会收敛至零。我们还建立了普适性结果,证明任何无分布变化点定位方法都必须是CONCH的实例。实验表明,即使在涉及图像或文本的挑战性场景中,CONCH仍能提供精确的置信集。