Devices in computer networks cannot work without essential network services provided by a limited count of devices. Identification of device dependencies determines whether a pair of IP addresses is a dependency, i.e., the host with the first IP address is dependent on the second one. These dependencies cannot be identified manually in large and dynamically changing networks. Nevertheless, they are important due to possible unexpected failures, performance issues, and cascading effects. We address the identification of dependencies using a new approach based on graph-based machine learning. The approach belongs to link prediction based on a latent representation of the computer network's communication graph. It samples random walks over IP addresses that fulfill time conditions imposed on network dependencies. The constrained random walks are used by a neural network to construct IP address embedding, which is a space that contains IP addresses that often appear close together in the same communication chain (i.e., random walk). Dependency embedding is constructed by combining values for IP addresses from their embedding and used for training the resulting dependency classifier. We evaluated the approach using IP flow datasets from a controlled environment and university campus network that contain evidence about dependencies. Evaluation concerning the correctness and relationship to other approaches shows that the approach achieves acceptable performance. It can simultaneously consider all types of dependencies and is applicable for batch processing in operational conditions.
翻译:计算机网络中的设备无法脱离有限数量的设备所提供的必要网络服务而工作。设备依赖关系的识别旨在判定一对IP地址之间是否存在依赖关系,即第一个IP地址对应的主机是否依赖于第二个IP地址。在规模庞大且动态变化的网络中,这些依赖关系无法通过人工方式识别。然而,由于可能出现的意外故障、性能问题及连锁效应,此类依赖关系具有重要研究价值。本文提出一种基于图机器学习的新方法来解决依赖关系识别问题。该方法属于基于计算机网络通信图潜在表示的链路预测技术。该方法对满足网络依赖关系时间约束条件的IP地址进行随机游走采样,并利用神经网络通过约束随机游走来构建IP地址嵌入空间——该空间将频繁出现在同一通信链(即随机游走路径)中的IP地址映射为邻近向量表示。依赖关系嵌入通过组合IP地址嵌入空间中的向量值构建,并用于训练最终的依赖关系分类器。我们在包含依赖关系证据的受控环境与大学校园网络IP流数据集上对该方法进行了评估。针对正确性及与其他方法关联性的评估结果表明,该方法能够达到可接受的性能水平,可同时考虑所有类型的依赖关系,并适用于实际运行环境中的批处理场景。