Graph sampling plays an important role in data mining for large networks. Specifically, larger networks often correspond to lower sampling rates. Under the situation, traditional traversal-based samplings for large networks usually have an excessive preference for densely-connected network core nodes. Aim at this issue, this paper proposes a sampling method for unknown networks at low sampling rates, called SLSR, which first adopts a random node sampling to evaluate a degree threshold, utilized to distinguish the core from periphery, and the average degree in unknown networks, and then runs a double-layer sampling strategy on the core and periphery. SLSR is simple and has a high time efficiency, but experimental evaluation confirms that the proposed method can accurately preserve many critical structures of unknown large networks at sampling rates not exceeding 10%.
翻译:图采样在大规模网络的数据挖掘中扮演着重要角色。具体而言,规模越大的网络往往对应越低的采样率。在此情况下,传统基于遍历的大规模网络采样方法通常会对密集连接的网络核心节点产生过度偏好。针对该问题,本文提出一种适用于低采样率的未知网络采样方法SLSR,该方法首先采用随机节点采样来评估度阈值(用于区分核心与外围区域)及未知网络的平均度,随后在核心与外围区域上执行双层采样策略。SLSR算法简单且时间效率高,实验评估证实,在采样率不超过10%的条件下,该方法能够准确保留未知大规模网络的诸多关键结构特性。