Most current methods for detecting anomalies in text concentrate on constructing models solely relying on unlabeled data. These models operate on the presumption that no labeled anomalous examples are available, which prevents them from utilizing prior knowledge of anomalies that are typically present in small numbers in many real-world applications. Furthermore, these models prioritize learning feature embeddings rather than optimizing anomaly scores directly, which could lead to suboptimal anomaly scoring and inefficient use of data during the learning process. In this paper, we introduce FATE, a deep few-shot learning-based framework that leverages limited anomaly examples and learns anomaly scores explicitly in an end-to-end method using deviation learning. In this approach, the anomaly scores of normal examples are adjusted to closely resemble reference scores obtained from a prior distribution. Conversely, anomaly samples are forced to have anomalous scores that considerably deviate from the reference score in the upper tail of the prior. Additionally, our model is optimized to learn the distinct behavior of anomalies by utilizing a multi-head self-attention layer and multiple instance learning approaches. Comprehensive experiments on several benchmark datasets demonstrate that our proposed approach attains a new level of state-of-the-art performance.
翻译:当前大多数文本异常检测方法主要依赖无标注数据构建模型,假设没有可用的标注异常样本,因此无法利用实际应用中少量存在的异常先验知识。此外,这类模型优先学习特征嵌入而非直接优化异常分数,可能导致次优的异常评分和学习过程中的数据利用率低下。本文提出FATE框架——一种基于深度少样本学习的解决方案,该框架利用少量异常样本,通过偏差学习以端到端方式显式学习异常分数。该方法将正常样本的异常分数调整至接近从先验分布获得的参考分数,同时迫使异常样本的分数显著偏离该参考分布的上尾参考值。模型通过多头自注意力层与多实例学习方法优化,以学习异常行为的独特模式。在多个基准数据集上的综合实验表明,所提方法达到了新的最优性能水平。