Conformal inference provides a general distribution-free method to rigorously calibrate the output of any machine learning algorithm for novelty detection. While this approach has many strengths, it has the limitation of being randomized, in the sense that it may lead to different results when analyzing twice the same data, and this can hinder the interpretation of any findings. We propose to make conformal inferences more stable by leveraging suitable conformal e-values instead of p-values to quantify statistical significance. This solution allows the evidence gathered from multiple analyses of the same data to be aggregated effectively while provably controlling the false discovery rate. Further, we show that the proposed method can reduce randomness without much loss of power compared to standard conformal inference, partly thanks to an innovative way of weighting conformal e-values based on additional side information carefully extracted from the same data. Simulations with synthetic and real data confirm this solution can be effective at eliminating random noise in the inferences obtained with state-of-the-art alternative techniques, sometimes also leading to higher power.
翻译:保形推断提供了一种通用的无分布方法,可严格校准任何机器学习算法在新颖性检测中的输出。尽管该方法具有诸多优势,但其存在随机化的局限性——即对同一数据重复分析可能得出不同结果,这会影响研究结论的可解释性。我们提出通过使用合适的保形E值替代p值来量化统计显著性,从而使保形推断更加稳定。该方案能够有效聚合同一数据多次分析的证据,同时可证明控制错误发现率。进一步研究表明,相较于标准保形推断,所提方法在不显著降低统计功效的前提下可减少随机性,这得益于一种创新方法——基于从同一数据中精细提取的辅助信息对保形E值进行加权。合成数据与真实数据的模拟实验证实,该方法能有效消除当前最先进替代技术所得推断中的随机噪声,有时还能提升统计功效。