Detection of malicious behavior in a large network is a challenging problem for machine learning in computer security, since it requires a model with high expressive power and scalable inference. Existing solutions struggle to achieve this feat -- current cybersec-tailored approaches are still limited in expressivity, and methods successful in other domains do not scale well for large volumes of data, rendering frequent retraining impossible. This work proposes a new perspective for learning from graph data that is modeling network entity interactions as a large heterogeneous graph. High expressivity of the method is achieved with neural network architecture HMILnet that naturally models this type of data and provides theoretical guarantees. The scalability is achieved by pursuing local graph inference, i.e., classifying individual vertices and their neighborhood as independent samples. Our experiments exhibit improvement over the state-of-the-art Probabilistic Threat Propagation (PTP) algorithm, show a further threefold accuracy improvement when additional data is used, which is not possible with the PTP algorithm, and demonstrate the generalization capabilities of the method to new, previously unseen entities.
翻译:在大规模网络中检测恶意行为是计算机安全领域机器学习面临的一个挑战性难题,因为它需要兼具高表达能力和可扩展推理的模型。现有解决方案难以实现这一目标——当前针对网络安全定制的方法在表达能力上仍存在局限,而在其他领域成功的方法无法适应海量数据的规模,导致频繁重训练无法实现。本研究提出了一种从图数据学习的新视角:将网络实体交互建模为大规模异构图。该方法通过神经网络架构HMILnet实现高表达能力,该架构天然适配此类数据并提供理论保证。可扩展性则通过局部图推理实现,即将单个顶点及其邻域作为独立样本进行分类。实验结果表明,本方法相较于最先进的概率威胁传播(PTP)算法有所提升,在使用额外数据时进一步实现三倍准确率提升(PTP算法无法实现此特性),并展示了该方法对先前未见新实体的泛化能力。