Unsupervised outlier detection is attractive because it eliminates the need for labeled data. Moreover, forming multi-model ensembles can improve detection robustness. However, composing an ensemble without labeled data is challenging. Naively composed ensembles can suffer from ensemble saturation, where redundant or unreliable detection models degrade performance and incur unnecessary computation. We propose MetaEns, an automatic unsupervised framework for selecting ensembles of outlier detection models. Using labeled meta-datasets, MetaEns learns a model that predicts marginal ensemble gains, estimating the expected improvement from adding a candidate model to a partially constructed ensemble. At test time, this learned signal is combined with a submodular-inspired proxy objective that enforces diminishing returns through diversity-aware discounting and family-level risk regularization, thereby enabling greedy sequential selection with adaptive early stopping. As a result, MetaEns constructs compact, high-quality ensembles without access to ground-truth labels. Experiments on 39 real-world datasets show that MetaEns consistently outperforms state-of-the-art unsupervised selectors and ensemble baselines, achieving higher average precision while using fewer models.
翻译:无监督异常检测因其消除对标注数据的需求而颇具吸引力。此外,构建多模型集成可提升检测鲁棒性。然而,在无标注数据情况下构建集成具有挑战性。朴素组合的集成可能遭受集成饱和问题,即冗余或不可靠的检测模型会降低性能并引发不必要的计算开销。我们提出MetaEns——一个用于自动选择异常检测模型集成的无监督框架。通过使用带标注的元数据集,MetaEns学习一个可预测边际集成增益的模型,从而估算将候选模型添加到部分构建的集成中时可预期的性能提升。在测试阶段,该学习信号与一个基于子模启发的代理目标相结合,通过多样性感知折扣和族级风险正则化强制实现递减回报,从而支持带自适应早停的贪心序列选择。最终,MetaEns能在无需真实标签的情况下构建紧凑且高质量的集成。在39个真实数据集上的实验表明,MetaEns始终优于最先进的无监督选择器与集成基线,在使用更少模型的同时实现了更高的平均精确率。