Network data, commonly used throughout the physical, social, and biological sciences, consists of nodes (individuals) and the edges (interactions) between them. One way to represent network data's complex, high-dimensional structure is to embed the graph into a low-dimensional geometric space. The curvature of this space, in particular, provides insights about the structure in the graph, such as the propensity to form triangles or present tree-like structures. We derive an estimating function for curvature based on triangle side lengths and the length of the midpoint of a side to the opposing corner. We construct an estimator where the only input is a distance matrix and also establish asymptotic normality. We next introduce a novel latent distance matrix estimator for networks and an efficient algorithm to compute the estimate via solving iterative quadratic programs. We apply this method to the Los Alamos National Laboratory Unified Network and Host dataset and show how curvature estimates can be used to detect a red-team attack faster than naive methods, as well as discover non-constant latent curvature in co-authorship networks in physics. The code for this paper is available at https://github.com/SteveJWR/netcurve, and the methods are implemented in the R package https://github.com/SteveJWR/lolaR.
翻译:网络数据广泛应用于物理、社会科学和生物科学领域,由节点(个体)及其之间的边(交互)构成。将图嵌入低维几何空间是表示网络数据复杂高维结构的一种方式。该空间的曲率能揭示图结构特征,例如形成三角形或呈现树状结构的倾向性。我们基于三角形边长及边中点与对角顶点的距离,推导出曲率的估计函数。构建出仅需距离矩阵作为输入的估计量,并建立了其渐近正态性。随后提出一种新型网络潜变量距离矩阵估计方法,以及通过求解迭代二次规划来高效计算估计值的算法。将该方法应用于洛斯阿拉莫斯国家实验室统一网络与主机数据集,展示了曲率估计可更早检测红队攻击(相较于朴素方法),并揭示了物理学合著者网络中非恒定潜变量曲率的存在。本文代码见 https://github.com/SteveJWR/netcurve,相关方法已实现于R包 https://github.com/SteveJWR/lolaR。