Graph sampling plays an important role in data mining for large networks. Specifically, larger networks often correspond to lower sampling rates. Under the situation, traditional traversal-based samplings for large networks usually have an excessive preference for densely-connected network core nodes. Aim at this issue, this paper proposes a sampling method for unknown networks at low sampling rates, called SLSR, which first adopts a random node sampling to evaluate a degree threshold, utilized to distinguish the core from periphery, and the average degree in unknown networks, and then runs a double-layer sampling strategy on the core and periphery. SLSR is simple that results in a high time efficiency, but experimental evaluation confirms that the proposed method can accurately preserve many critical structures of unknown large networks with low sampling rates and low variances.
翻译:图采样在大规模网络数据挖掘中扮演着重要角色。具体而言,较大规模的网络往往对应较低的采样率。在此情况下,传统基于遍历的大规模网络采样方法通常会对网络核心节点产生过度偏好。针对这一问题,本文提出一种适用于低采样率的未知网络采样方法——SLSR,该方法首先采用随机节点采样来评估用于区分核心与边缘的度阈值以及未知网络的平均度,随后在核心与边缘区域运行双层采样策略。SLSR结构简单,具有较高的时间效率,但实验评估证实,该方法能够在低采样率和低方差条件下,精确保留未知大规模网络的许多关键结构特征。