Conformal prediction and other randomized model-free inference techniques are gaining increasing attention as general solutions to rigorously calibrate the output of any machine learning algorithm for novelty detection. This paper contributes to the field by developing a novel method for mitigating their algorithmic randomness, leading to an even more interpretable and reliable framework for powerful novelty detection under false discovery rate control. The idea is to leverage suitable conformal e-values instead of p-values to quantify the significance of each finding, which allows the evidence gathered from multiple mutually dependent analyses of the same data to be seamlessly aggregated. Further, the proposed method can reduce randomness without much loss of power, partly thanks to an innovative way of weighting conformal e-values based on additional side information carefully extracted from the same data. Simulations with synthetic and real data confirm this solution can be effective at eliminating random noise in the inferences obtained with state-of-the-art alternative techniques, sometimes also leading to higher power.
翻译:共形预测及其他无模型随机推断技术作为严格校准机器学习算法新奇检测输出的通用方法正受到日益广泛的关注。本文通过开发一种降低算法随机性的新方法推动该领域发展,从而在错误发现率控制下构建更具可解释性和可靠性的强力新奇检测框架。其核心思想是利用合适的共形e值(而非p值)来量化每个发现的显著性,这使得来自同一数据多次相互依赖分析所获得的证据能够被无缝整合。此外,所提方法可在不显著损失统计效力的前提下降低随机性,这得益于一种基于从同一数据中精心提取的辅助信息对共形e值进行加权的创新方式。合成数据与真实数据的仿真实验证实,该方案能有效消除采用最先进替代技术所获推断结果中的随机噪声,有时还能获得更高的统计效力。