Data cubes are multidimensional databases, often built from several separate databases, that serve as flexible basis for data analysis. Surprisingly, outlier detection on data cubes has not yet been treated extensively. In this work, we provide the first framework to evaluate robust outlier detection methods in data cubes (RODD). We introduce a novel random forest-based outlier detection approach (RODD-RF) and compare it with more traditional methods based on robust location estimators. We propose a general type of test data and examine all methods in a simulation study. Moreover, we apply ROOD-RF to real world data. The results show that RODD-RF can lead to improved outlier detection.
翻译:数据立方体是多维数据库,通常由多个独立数据库构建而成,为数据分析提供灵活的基础。令人惊讶的是,数据立方体中的异常检测尚未得到广泛研究。本文首次提出评估数据立方体中鲁棒异常检测方法(RODD)的框架。我们引入了一种基于随机森林的新型异常检测方法(RODD-RF),并将其与基于鲁棒位置估计器的传统方法进行比较。我们提出了一种通用测试数据类型,并在模拟研究中对所有方法进行检验。此外,我们将RODD-RF应用于真实世界数据。结果表明,RODD-RF可以提升异常检测效果。