Outlier detection is critical in real applications to prevent financial fraud, defend network intrusions, or detecting imminent device failures. To reduce the human effort in evaluating outlier detection results and effectively turn the outliers into actionable insights, the users often expect a system to automatically produce interpretable summarizations of subgroups of outlier detection results. Unfortunately, to date no such systems exist. To fill this gap, we propose STAIR which learns a compact set of human understandable rules to summarize and explain the anomaly detection results. Rather than use the classical decision tree algorithms to produce these rules, STAIR proposes a new optimization objective to produce a small number of rules with least complexity, hence strong interpretability, to accurately summarize the detection results. The learning algorithm of STAIR produces a rule set by iteratively splitting the large rules and is optimal in maximizing this objective in each iteration. Moreover, to effectively handle high dimensional, highly complex data sets which are hard to summarize with simple rules, we propose a localized STAIR approach, called L-STAIR. Taking data locality into consideration, it simultaneously partitions data and learns a set of localized rules for each partition. Our experimental study on many outlier benchmark datasets shows that STAIR significantly reduces the complexity of the rules required to summarize the outlier detection results, thus more amenable for humans to understand and evaluate, compared to the decision tree methods.
翻译:离群点检测在实际应用中至关重要,用于防范金融欺诈、抵御网络入侵或检测即将发生的设备故障。为减少评估离群点检测结果的人力投入,并将离群点有效转化为可操作洞察,用户通常期望系统能自动对检测结果中的子群生成可解释的总结。遗憾的是,至今尚无此类系统存在。为填补这一空白,我们提出STAIR方法,该方法学习一组简洁且符合人类理解的规则,以总结和解释异常检测结果。与使用经典决策树算法生成规则不同,STAIR提出了一种新的优化目标,旨在产生数量最少、复杂度最低、因而可解释性极强的规则集,从而准确总结检测结果。STAIR的学习算法通过迭代拆分大规则来生成规则集,并在每次迭代中最大化该优化目标。此外,为有效处理难以用简单规则总结的高维、高复杂度数据集,我们提出了局部化STAIR方法,称为L-STAIR。该方法考虑数据的局部性,同时划分数据并为每个分区学习一组局部规则。我们在多个离群点基准数据集上的实验研究表明,与决策树方法相比,STAIR显著降低了总结离群点检测结果所需规则的复杂度,从而更便于人类理解和评估。