Anomaly detection is essential in many application domains, such as cyber security, law enforcement, medicine, and fraud protection. However, the decision-making of current deep learning approaches is notoriously hard to understand, which often limits their practical applicability. To overcome this limitation, we propose a framework for learning inherently interpretable anomaly detectors from sequential data. More specifically, we consider the task of learning a deterministic finite automaton (DFA) from a given multi-set of unlabeled sequences. We show that this problem is computationally hard and develop two learning algorithms based on constraint optimization. Moreover, we introduce novel regularization schemes for our optimization problems that improve the overall interpretability of our DFAs. Using a prototype implementation, we demonstrate that our approach shows promising results in terms of accuracy and F1 score.
翻译:异常检测在网络安全、执法、医疗和欺诈防护等众多应用领域中至关重要。然而,当前深度学习方法在决策过程中往往难以理解,这常常限制了其实际应用。为解决这一局限,我们提出了一种从序列数据中学习具有内在可解释性的异常检测器的框架。具体而言,我们考虑从给定的未标记序列多集合中学习确定性有限自动机(DFA)的任务。我们证明了该问题在计算上具有高难度,并开发了两种基于约束优化的学习算法。此外,我们为优化问题引入了新颖的正则化方案,以提升DFA的整体可解释性。通过原型实现,我们证明了该方法在准确率和F1分数方面展现出有前景的结果。