Semi-supervised anomaly detection is based on the principle that potential anomalies are those records that look different from normal training data. However, in some cases we are specifically interested in anomalies that correspond to high attribute values (or low, but not both). We present two asymmetrical distance measures that take this directionality into account: ramp distance and signed distance. Through experiments on synthetic and real-life datasets we show that ramp distance performs as well or better than the absolute distance traditionally used in anomaly detection. While signed distance also performs well on synthetic data, it performs substantially poorer on real-life datasets. We argue that this reflects the fact that in practice, good scores on some attributes should not be allowed to compensate for bad scores on others.
翻译:半监督异常检测基于以下原理:潜在的异常是那些与正常训练数据不同的记录。然而,在某些情况下,我们特别关注对应于高属性值(或低属性值,但非两者兼有)的异常。我们提出了两种考虑这种方向性的非对称距离度量:斜坡距离和符号距离。通过在合成和真实数据集上的实验,我们表明斜坡距离的表现与传统异常检测中使用的绝对距离相当或更优。虽然符号距离在合成数据上也表现良好,但在真实数据集上的表现明显较差。我们认为这反映了以下事实:在实践中,某些属性的良好得分不应被允许补偿其他属性的不良得分。