Graph sampling plays an important role in data mining for large networks. Specifically, larger networks often correspond to lower sampling rates. Under the situation, traditional traversal-based samplings for large networks usually have an excessive preference for densely-connected network core nodes. Aim at this issue, this paper proposes a sampling method for unknown networks at low sampling rates, called SLSR, which first adopts a random node sampling to evaluate a degree threshold, utilized to distinguish the core from periphery, and the average degree in unknown networks, and then runs a double-layer sampling strategy on the core and periphery. SLSR is simple that results in a high time efficiency, but experimental evaluation confirms that the proposed method can accurately preserve many critical structures of unknown large networks with low variances and low sampling rates.
翻译:图采样在大规模网络数据挖掘中具有重要作用。具体而言,规模越大的网络往往对应更低的采样率。在此情况下,传统的基于遍历的大规模网络采样方法通常会过度偏向于稠密连接的网络核心节点。针对该问题,本文提出了一种适用于低采样率的未知网络采样方法SLSR,该方法首先采用随机节点采样来评估度数阈值(用于区分核心区与边缘区)以及未知网络的平均度数,随后对核心区与边缘区执行双层采样策略。SLSR方法架构简洁,具有较高的时间效率,但实验评估证实,该方法能在低方差和低采样率条件下,准确保留未知大规模网络的许多关键结构特征。