Outlier detection is critical in real applications to prevent financial fraud, defend network intrusions, or detecting imminent device failures. To reduce the human effort in evaluating outlier detection results and effectively turn the outliers into actionable insights, the users often expect a system to automatically produce interpretable summarizations of subgroups of outlier detection results. Unfortunately, to date no such systems exist. To fill this gap, we propose STAIR which learns a compact set of human understandable rules to summarize and explain the anomaly detection results. Rather than use the classical decision tree algorithms to produce these rules, STAIR proposes a new optimization objective to produce a small number of rules with least complexity, hence strong interpretability, to accurately summarize the detection results. The learning algorithm of STAIR produces a rule set by iteratively splitting the large rules and is optimal in maximizing this objective in each iteration. Moreover, to effectively handle high dimensional, highly complex data sets which are hard to summarize with simple rules, we propose a localized STAIR approach, called L-STAIR. Taking data locality into consideration, it simultaneously partitions data and learns a set of localized rules for each partition. Our experimental study on many outlier benchmark datasets shows that STAIR significantly reduces the complexity of the rules required to summarize the outlier detection results, thus more amenable for humans to understand and evaluate, compared to the decision tree methods.
翻译:离群点检测在现实应用中至关重要,可用于防范金融欺诈、抵御网络入侵或检测即将发生的设备故障。为减少人工评估离群点检测结果的工作量,并有效将离群点转化为可操作的洞察,用户通常期望系统能够自动生成离群点检测结果子群的可解释性概括。然而,迄今为止尚不存在此类系统。为填补这一空白,我们提出STAIR方法,该方法学习一组紧凑且人类可理解的规则,用于概括和解释异常检测结果。与使用经典决策树算法生成这些规则不同,STAIR提出一种新的优化目标,以产生数量最少且复杂度最低的规则集,从而具备强可解释性,精准概括检测结果。STAIR的学习算法通过迭代拆分大型规则来生成规则集,并在每次迭代中实现该目标的最优化。此外,为有效处理难以用简单规则概括的高维、高复杂度数据集,我们提出一种局部化STAIR方法,称为L-STAIR。该方法考虑数据局部性,同时划分数据并为每个分区学习一组局部化规则。我们在多个离群点基准数据集上的实验研究表明,与决策树方法相比,STAIR显著降低了概括离群点检测结果所需规则的复杂度,从而更便于人类理解和评估。