Facade: High-Precision Insider Threat Detection Using Deep Contextual Anomaly Detection

Alex Kantchelian,Casper Neo,Ryan Stevens,Hyungwon Kim,Zhaohao Fu,Sadegh Momeni,Birkett Huber,Elie Bursztein,Yanis Pavlidis,Senaka Buthpitiya,Martin Cochran,Massimiliano Poletto

from arxiv, Under review

We present Facade (Fast and Accurate Contextual Anomaly DEtection): a high-precision deep-learning-based anomaly detection system deployed at Google (a large technology company) as the last line of defense against insider threats since 2018. Facade is an innovative unsupervised action-context system that detects suspicious actions by considering the context surrounding each action, including relevant facts about the user and other entities involved. It is built around a new multi-modal model that is trained on corporate document access, SQL query, and HTTP/RPC request logs. To overcome the scarcity of incident data, Facade harnesses a novel contrastive learning strategy that relies solely on benign data. Its use of history and implicit social network featurization efficiently handles the frequent out-of-distribution events that occur in a rapidly changing corporate environment, and sustains Facade's high precision performance for a full year after training. Beyond the core model, Facade contributes an innovative clustering approach based on user and action embeddings to improve detection robustness and achieve high precision, multi-scale detection. Functionally what sets Facade apart from existing anomaly detection systems is its high precision. It detects insider attackers with an extremely low false positive rate, lower than 0.01%. For single rogue actions, such as the illegitimate access to a sensitive document, the false positive rate is as low as 0.0003%. To the best of our knowledge, Facade is the only published insider risk anomaly detection system that helps secure such a large corporate environment.

翻译：本文介绍Facade（快速准确上下文异常检测系统）：一种基于深度学习的高精度异常检测系统，自2018年起作为防范内部威胁的最后一道防线部署于谷歌（一家大型科技公司）。Facade是一种创新的无监督行动-上下文系统，通过考量每个行动周围的上下文（包括涉及用户及其他实体的相关事实）来检测可疑行为。该系统围绕新型多模态模型构建，该模型基于企业文档访问记录、SQL查询日志及HTTP/RPC请求日志进行训练。为克服事件数据稀缺的挑战，Facade采用仅依赖良性数据的新型对比学习策略。通过历史特征化与隐式社交网络特征化的结合，该系统能有效处理快速变化的企业环境中频繁出现的分布外事件，并在训练后维持整年的高精度检测性能。除核心模型外，Facade还提出基于用户与行动嵌入向量的创新聚类方法，以提升检测鲁棒性并实现高精度多尺度检测。从功能角度看，Facade与现有异常检测系统的核心区别在于其高精度特性。该系统能以极低的误报率（低于0.01%）检测内部攻击者。针对单次恶意行动（如非法访问敏感文档），误报率可低至0.0003%。据我们所知，Facade是目前唯一公开发表的、能够为如此大规模企业环境提供安全保障的内部风险异常检测系统。