Enterprises and organizations are faced with potential threats from insider employees that may lead to serious consequences. Previous studies on insider threat detection (ITD) mainly focus on detecting abnormal users or abnormal time periods (e.g., a week or a day). However, a user may have hundreds of thousands of activities in the log, and even within a day there may exist thousands of activities for a user, requiring a high investigation budget to verify abnormal users or activities given the detection results. On the other hand, existing works are mainly post-hoc methods rather than real-time detection, which can not report insider threats in time before they cause loss. In this paper, we conduct the first study towards real-time ITD at activity level, and present a fine-grained and efficient framework LAN. Specifically, LAN simultaneously learns the temporal dependencies within an activity sequence and the relationships between activities across sequences with graph structure learning. Moreover, to mitigate the data imbalance problem in ITD, we propose a novel hybrid prediction loss, which integrates self-supervision signals {from normal activities} and supervision signals from abnormal activities into a unified loss for anomaly detection. We evaluate the performance of LAN on two widely used datasets, i.e., CERT r4.2 and CERT r5.2. Extensive and comparative experiments demonstrate the superiority of LAN, outperforming 9 state-of-the-art baselines by at least 9.92% and 6.35% in AUC for real-time ITD on CERT r4.2 and r5.2, respectively. Moreover, LAN can be also applied to post-hoc ITD, surpassing 8 competitive baselines by at least 7.70% and 4.03% in AUC on two datasets. Finally, the ablation study, parameter analysis, and compatibility analysis evaluate the impact of each module and hyper-parameter in LAN.
翻译:企业和组织面临内部员工可能引发的严重潜在威胁。现有内部威胁检测研究主要聚焦于异常用户或异常时间段(如一周或一天)的检测。然而,单个用户在日志中可能包含数十万条活动记录,即使在一日内也可能存在数千条活动,这使得基于检测结果核实异常用户或活动需要高昂的调查预算。另一方面,现有方法多为事后检测而非实时检测,无法在威胁造成损失前及时预警。本文首次开展活动级别的实时内部威胁检测研究,提出细粒度且高效的LAN框架。具体而言,LAN通过图结构学习同时建模活动序列内部的时序依赖关系与跨序列活动间的关联关系。此外,为缓解ITD中的数据不平衡问题,我们提出新型混合预测损失函数,将来自正常活动的自监督信号与异常活动的监督信号统一整合为异常检测的联合损失。我们在两个广泛使用的数据集CERT r4.2与CERT r5.2上评估LAN性能。大量对比实验表明,LAN在实时ITD任务中表现优越:在CERT r4.2和r5.2数据集上,其AUC较9种最先进基线分别提升至少9.92%和6.35%。同时,LAN也可应用于事后ITD,在两个数据集上以AUC指标超越8种竞争基线至少7.70%和4.03%。最后,消融实验、参数分析与兼容性分析验证了LAN各模块及超参数的影响。