LAN: Learning Adaptive Neighbors for Real-Time Insider Threat Detection

Enterprises and organizations are faced with potential threats from insider employees that may lead to serious consequences. Previous studies on insider threat detection (ITD) mainly focus on detecting abnormal users or abnormal time periods (e.g., a week or a day). However, a user may have hundreds of thousands of activities in the log, and even within a day there may exist thousands of activities for a user, requiring a high investigation budget to verify abnormal users or activities given the detection results. On the other hand, existing works are mainly post-hoc methods rather than real-time detection, which can not report insider threats in time before they cause loss. In this paper, we conduct the first study towards real-time ITD at activity level, and present a fine-grained and efficient framework LAN. Specifically, LAN simultaneously learns the temporal dependencies within an activity sequence and the relationships between activities across sequences with graph structure learning. Moreover, to mitigate the data imbalance problem in ITD, we propose a novel hybrid prediction loss, which integrates self-supervision signals from normal activities and supervision signals from abnormal activities into a unified loss for anomaly detection. We evaluate the performance of LAN on two widely used datasets, i.e., CERT r4.2 and CERT r5.2. Extensive and comparative experiments demonstrate the superiority of LAN, outperforming 9 state-of-the-art baselines by at least 9.92% and 6.35% in AUC for real-time ITD on CERT r4.2 and r5.2, respectively. Moreover, LAN can be also applied to post-hoc ITD, surpassing 8 competitive baselines by at least 7.70% and 4.03% in AUC on two datasets. Finally, the ablation study, parameter analysis, and compatibility analysis evaluate the impact of each module and hyper-parameter in LAN. The source code can be obtained from https://github.com/Li1Neo/LAN.

翻译：企业与组织面临内部员工可能引发严重后果的潜在威胁。现有内部威胁检测（ITD）研究主要聚焦于检测异常用户或异常时间段（如一周或一天）。然而，单个用户在日志中可能包含数十万条活动记录，即便在一天之内也可能存在数千条活动，导致基于检测结果核实异常用户或活动需要高昂的调查成本。另一方面，现有方法多为事后检测而非实时检测，无法在威胁造成损失前及时预警。本文首次开展活动级别的实时ITD研究，提出细粒度且高效的框架LAN。具体而言，LAN通过图结构学习同步捕获活动序列内部的时间依赖关系与跨序列活动之间的关联。针对ITD中的数据不平衡问题，我们提出新型混合预测损失函数，将正常活动的自监督信号与异常活动的监督信号整合为统一异常检测损失。在CERT r4.2和r5.2两个广泛使用的数据集上的实验表明，LAN在实时ITD任务中AUC值分别超越9个最新基线方法至少9.92%和6.35%。该方法同样适用于事后ITD，在两个数据集上以AUC值超越8个对比基线至少7.70%和4.03%。消融实验、参数分析与兼容性分析评估了各模块与超参数的影响。源代码可从https://github.com/Li1Neo/LAN获取。