This paper studies the semi-supervised novelty detection problem where a set of "typical" measurements is available to the researcher. Motivated by recent advances in multiple testing and conformal inference, we propose AdaDetect, a flexible method that is able to wrap around any probabilistic classification algorithm and control the false discovery rate (FDR) on detected novelties in finite samples without any distributional assumption other than exchangeability. In contrast to classical FDR-controlling procedures that are often committed to a pre-specified p-value function, AdaDetect learns the transformation in a data-adaptive manner to focus the power on the directions that distinguish between inliers and outliers. Inspired by the multiple testing literature, we further propose variants of AdaDetect that are adaptive to the proportion of nulls while maintaining the finite-sample FDR control. The methods are illustrated on synthetic datasets and real-world datasets, including an application in astrophysics.
翻译:本文研究半监督新颖性检测问题,其中研究者可获得一组“典型”测量数据。受多重检验与共形推断领域最新进展的启发,我们提出AdaDetect方法——一种灵活框架,可嵌入任意概率分类算法,在仅依赖可交换性假设的条件下,实现对检测到的新颖样本的有限样本错误发现率控制。与需预先指定p值函数的经典FDR控制方法不同,AdaDetect通过数据自适应方式学习变换函数,将检测效能聚焦于区分正态样本与异常样本的关键方向。进一步受多重检验文献启发,我们在保持有限样本FDR控制的同时,提出了能够自适应于零假设比例的AdaDetect变体。通过合成数据集和真实数据集(包括天体物理学应用)验证了方法的有效性。