Multi-source logs provide a comprehensive overview of ongoing system activities, allowing for in-depth analysis to detect potential threats. A practical approach for threat detection involves explicit extraction of entity triples (subject, action, object) towards building provenance graphs to facilitate the analysis of system behavior. However, current log parsing methods mainly focus on retrieving parameters and events from raw logs while approaches based on entity extraction are limited to processing a single type of log. To address these gaps, we contribute with a novel unified framework, coined UTLParser. UTLParser adopts semantic analysis to construct causal graphs by merging multiple sub-graphs from individual log sources in labeled log dataset. It leverages domain knowledge in threat hunting such as Points of Interest. We further explore log generation delays and provide interfaces for optimized temporal graph querying. Our experiments showcase that UTLParser overcomes drawbacks of other log parsing methods. Furthermore, UTLParser precisely extracts explicit causal threat information while being compatible with enormous downstream tasks.
翻译:多源日志提供了系统当前活动的全面概览,允许进行深入分析以检测潜在威胁。一种实用的威胁检测方法涉及显式提取实体三元组(主体、动作、客体)以构建溯源图,从而促进系统行为分析。然而,当前的日志解析方法主要侧重于从原始日志中检索参数和事件,而基于实体提取的方法仅限于处理单一类型的日志。为弥补这些不足,我们提出了一种新颖的统一框架,命名为UTLParser。UTLParser采用语义分析,通过合并来自标记日志数据集中各个日志源的多个子图来构建因果图。它利用了威胁狩猎中的领域知识,例如兴趣点。我们进一步探索了日志生成延迟,并为优化的时序图查询提供了接口。我们的实验表明,UTLParser克服了其他日志解析方法的缺点。此外,UTLParser能够精确提取显式的因果威胁信息,同时与大量下游任务兼容。