Conformal inference provides a general distribution-free method to rigorously calibrate the output of any machine learning algorithm for novelty detection. While this approach has many strengths, it has the limitation of being randomized, in the sense that it may lead to different results when analyzing twice the same data, and this can hinder the interpretation of any findings. We propose to make conformal inferences more stable by leveraging suitable conformal e-values instead of p-values to quantify statistical significance. This solution allows the evidence gathered from multiple analyses of the same data to be aggregated effectively while provably controlling the false discovery rate. Further, we show that the proposed method can reduce randomness without much loss of power compared to standard conformal inference, partly thanks to an innovative way of weighting conformal e-values based on additional side information carefully extracted from the same data. Simulations with synthetic and real data confirm this solution can be effective at eliminating random noise in the inferences obtained with state-of-the-art alternative techniques, sometimes also leading to higher power.
翻译:保形推断提供了一种通用的无分布方法,可严格校准任何机器学习算法在新颖性检测中的输出。尽管该方法具有诸多优势,但其存在随机化的局限性——即对同一数据重复分析可能产生不同结果,这会影响发现的解释性。本文提出通过利用合适的保形E值(而非p值)量化统计显著性,以增强保形推断的稳定性。该方案能够有效聚合同一数据多次分析的证据,同时可控地保障错误发现率。进一步研究表明,与标准保形推断相比,本文方法可在不显著损失统计功效的前提下降低随机性,这得益于一种创新性的加权策略——基于从同一数据中精心提取的额外辅助信息对保形E值进行加权。合成数据与真实数据的模拟实验证实,该方案能有效消除现有先进技术推断结果中的随机噪声,有时甚至能提升统计功效。