Suppose that there is an unknown underlying graph $G$ on a large vertex set, and we can test only a proportion of the possible edges to check whether they are present in $G$. If $G$ has high modularity, is the observed graph $G'$ likely to have high modularity? We see that this is indeed the case under a mild condition, in a natural model where we test edges at random. We find that $q^*(G') \geq q^*(G)-\varepsilon$ with probability at least $1-\varepsilon$, as long as the expected number edges in $G'$ is large enough. Similarly, $q^*(G') \leq q^*(G)+\varepsilon$ with probability at least $1-\varepsilon$, under the stronger condition that the expected average degree in $G'$ is large enough. Further, under this stronger condition, finding a good partition for $G'$ helps us to find a good partition for $G$. We also consider the vertex sampling model for partially observing the underlying graph: we find that for dense underlying graphs we may estimate the modularity by sampling constantly many vertices and observing the corresponding induced subgraph, but this does not hold for underlying graphs with a subquadratic number of edges. Finally we deduce some related results, for example showing that under-sampling tends to lead to overestimation of modularity.
翻译:假设存在一个未知的底层图$G$,其顶点集规模庞大,而我们仅能测试其中一部分可能的边,以判断这些边是否存在于$G$中。若$G$具有高模块化度,那么观测图$G'$是否也可能具有高模块化度?我们发现在一个自然模型中(随机测试边),只要满足温和条件,这一结论确实成立。研究表明:当$G'$的期望边数足够大时,$q^*(G') \geq q^*(G)-\varepsilon$ 的概率至少为$1-\varepsilon$;类似地,在$G'$的期望平均度足够大这一更强条件下,$q^*(G') \leq q^*(G)+\varepsilon$ 的概率也至少为$1-\varepsilon$。进一步地,在此更强条件下,为$G'$找到良好划分有助于为$G$找到良好划分。我们还考虑用于部分观测底层图的顶点采样模型:发现对于稠密底层图,可通过恒定数量的顶点采样并观测其诱导子图来估计模块化度,但这一结论不适用于边数为次二次型的底层图。最后我们推导出一些相关结论,例如表明欠采样往往导致模块化度的高估。