With the increasing sophistication of Advanced Persistent Threats (APTs), the demand for effective detection and mitigation strategies and methods has escalated. Program execution leaves traces in the system audit log, which can be analyzed to detect malicious activities. However, collecting and analyzing large volumes of audit logs over extended periods is challenging, further compounded by insufficient labeling that hinders their usability. Addressing these challenges, this paper introduces SAGA (Synthetic Audit log Generation for APT campaigns), a novel approach for generating find-grained labeled synthetic audit logs that mimic real-world system logs while embedding stealthy APT attacks. SAGA generates configurable audit logs for arbitrary duration, blending benign logs from normal operations with malicious logs based on the definitions the MITRE ATT\&CK framework. Malicious audit logs follow an APT lifecycle, incorporating various attack techniques at each stage. These synthetic logs can serve as benchmark datasets for training machine learning models and assessing diverse APT detection methods. To demonstrate the usefulness of synthetic audit logs, we ran established baselines of event-based technique hunting and APT campaign detection using various synthetic audit logs. In addition, we show that a deep learning model trained on synthetic audit logs can detect previously unseen techniques within audit logs.
翻译:随着高级持续性威胁(APT)日益复杂化,对有效检测与缓解策略及方法的需求不断攀升。程序执行会在系统审计日志中留下痕迹,通过分析这些日志可检测恶意活动。然而,长期收集与分析海量审计日志具有挑战性,加之标签信息不足进一步阻碍了其可用性。为应对这些挑战,本文提出SAGA(面向APT攻击活动的合成审计日志生成方法),这是一种生成细粒度标记合成审计日志的新方法,能够模拟真实系统日志并嵌入隐蔽的APT攻击。SAGA可生成任意时长的可配置审计日志,将正常操作的良性日志与基于MITRE ATT&CK框架定义的恶意日志相融合。恶意审计日志遵循APT生命周期,在每个阶段融入多种攻击技术。这些合成日志可作为基准数据集,用于训练机器学习模型及评估各类APT检测方法。为验证合成审计日志的实用性,我们使用多种合成审计日志运行了基于事件的技术狩猎与APT攻击活动检测的现有基线方法。此外,我们证明基于合成审计日志训练的深度学习模型能够检测审计日志中先前未见过的攻击技术。