Event-based datasets are crucial for cybersecurity analysis. A key use case is detecting event-based signatures, which represent attacks spanning multiple events and can only be understood once the relevant events are identified and linked. Analysing event datasets is essential for monitoring system security, but their growing volume and frequency create significant scalability and processing difficulties. Researchers rely on these datasets to develop and test techniques for automatically identifying signatures. However, because real datasets are security-sensitive and rarely shared, it becomes difficult to perform meaningful comparative evaluation between different approaches. This work addresses this evaluation limitation by offering a systematic method for generating event logs with known ground truth, enabling reproducible and comparable research. We present a novel parametrised generation technique capable of producing synthetic event datasets that contain event-based signatures for discovery. To demonstrate the capabilities of the technique, we provide a benchmark in signature detection. Our benchmarking demonstrated the suitability of DBSCAN, achieving a score greater than 0.95 Adjusted Rand Index on most generated datasets. This work enhances the ability of researchers to develop and benchmark new cybersecurity techniques, ultimately contributing to more robust and effective cybersecurity measures.
翻译:基于事件的数据集对网络安全分析至关重要。其核心应用场景之一是检测基于事件的签名,这类签名代表跨越多个事件的攻击模式,只有在识别并关联相关事件后才能被理解。分析事件数据集对于监控系统安全至关重要,但这些数据集不断增长的规模和频率带来了显著的扩展性与处理难题。研究人员依赖这些数据集来开发和测试自动识别签名的技术。然而,由于真实数据集涉及安全敏感性且极少被共享,在不同方法之间进行有意义的比较评估变得十分困难。本研究通过提供一种生成具有已知真实标注事件日志的系统性方法,解决了这一评估局限,从而支持可复现与可比较的研究。我们提出了一种新颖的参数化生成技术,能够生成包含待发现事件签名的合成事件数据集。为展示该技术的性能,我们提供了签名检测领域的基准测试。我们的基准测试证明了DBSCAN的适用性,在大多数生成数据集上取得了大于0.95的调整兰德指数得分。这项工作提升了研究人员开发和评估新型网络安全技术的能力,最终有助于建立更稳健有效的网络安全措施。