The Erd\H{o}s-R\'enyi random graph is the simplest model for node degree distribution, and it is one of the most widely studied. In this model, pairs of $n$ vertices are selected and connected uniformly at random with probability $p$, consequently, the degrees for a given vertex follow the binomial distribution. If the number of vertices is large, the binomial can be approximated by Normal using the Central Limit Theorem, which is often allowed when $\min (np, n(1-p)) > 5$. This is true for every node independently. However, due to the fact that the degrees of nodes in a graph are not independent, we aim in this paper to test whether the degrees of per node collectively in the Erd\H{o}s-R\'enyi graph have a multivariate normal distribution MVN. A chi square goodness of fit test for the hypothesis that binomial is a distribution for the whole set of nodes is rejected because of the dependence between degrees. Before testing MVN we show that the covariance and correlation between the degrees of any pair of nodes in the graph are $p(1-p)$ and $1/(n-1)$, respectively. We test MVN considering two assumptions: independent and dependent degrees, and we obtain our results based on the percentages of rejected statistics of chi square, the $p$-values of Anderson Darling test, and a CDF comparison. We always achieve a good fit of multivariate normal distribution with large values of $n$ and $p$, and very poor fit when $n$ or $p$ are very small. The approximation seems valid when $np \geq 10$. We also compare the maximum likelihood estimate of $p$ in MVN distribution where we assume independence and dependence. The estimators are assessed using bias, variance and mean square error.
翻译:埃尔德什-雷尼随机图是节点度分布的最简模型,也是被研究最广泛的模型之一。在该模型中,随机选取 \( n \) 个顶点中的一对,以概率 \( p \) 均匀连接,因此给定顶点的度数服从二项分布。当顶点数量较大时,利用中心极限定理可将二项分布近似为正态分布,通常允许在 \( \min(np, n(1-p)) > 5 \) 的条件下成立。这对每个节点独立成立。然而,由于图中节点度数并非独立,本文旨在检验埃尔德什-雷尼图中所有节点度数是否联合服从多元正态分布 (MVN)。由于度数之间的依赖性,针对二项分布为整个节点集分布的假设的卡方拟合优度检验被拒绝。在检验MVN之前,我们表明图中任意一对节点的度数之间的协方差和相关分别为 \( p(1-p) \) 和 \( 1/(n-1) \)。我们在独立和依赖度数的两种假设下检验MVN,并根据卡方拒绝统计量的百分比、安德森-达林检验的 \( p \) 值以及累积分布函数比较得出结果。当 \( n \) 和 \( p \) 较大时,多元正态分布始终拟合良好;而当 \( n \) 或 \( p \) 极小时,拟合极差。该近似在 \( np \geq 10 \) 时似乎有效。我们还比较了在假设独立性和依赖性的MVN分布中 \( p \) 的最大似然估计。使用偏差、方差和均方误差评估估计量。