As cyber threats continue to evolve in sophistication and scale, the ability to detect anomalous network behavior has become critical for maintaining robust cybersecurity defenses. Modern cybersecurity systems face the overwhelming challenge of analyzing billions of daily network interactions to identify potential threats, making efficient and accurate anomaly detection algorithms crucial for network defense. This paper investigates the use of variations of the Isolation Forest (iForest) machine learning algorithm for detecting anomalies in internet scan data. In particular, it presents the Set-Partitioned Isolation Forest (siForest), a novel extension of the iForest method designed to detect anomalies in set-structured data. By treating instances such as sets of multiple network scans with the same IP address as cohesive units, siForest effectively addresses some challenges of analyzing complex, multidimensional datasets. Extensive experiments on synthetic datasets simulating diverse anomaly scenarios in network traffic demonstrate that siForest has the potential to outperform traditional approaches on some types of internet scan data.
翻译:随着网络威胁在复杂性和规模上持续演进,检测异常网络行为的能力已成为维护强健网络安全防御的关键。现代网络安全系统面临分析每日数十亿网络交互以识别潜在威胁的巨大挑战,这使得高效且准确的异常检测算法对网络防御至关重要。本文研究了隔离森林(iForest)机器学习算法的变体在互联网扫描数据异常检测中的应用。特别地,本文提出了集合划分隔离森林(siForest)——一种专为检测集合结构数据中异常而设计的iForest方法新颖扩展。通过将具有相同IP地址的多次网络扫描等实例视为连贯单元进行处理,siForest有效解决了分析复杂多维数据集的部分挑战。在模拟网络流量中多样化异常场景的合成数据集上进行的大量实验表明,siForest在某些类型的互联网扫描数据上具有超越传统方法的潜力。