Community detection becomes an important problem with the booming of social networks. The Medoid-Shift algorithm preserves the benefits of Mean-Shift and can be applied to problems based on distance matrix, such as community detection. One drawback of the Medoid-Shift algorithm is that there may be no data points within the neighborhood region defined by a distance parameter. To deal with the community detection problem better, a new algorithm called Revised Medoid-Shift (RMS) in this work is thus proposed. During the process of finding the next medoid, the RMS algorithm is based on a neighborhood defined by KNN, while the original Medoid-Shift is based on a neighborhood defined by a distance parameter. Since the neighborhood defined by KNN is more stable than the one defined by the distance parameter in terms of the number of data points within the neighborhood, the RMS algorithm may converge more smoothly. In the RMS method, each of the data points is shifted towards a medoid within the neighborhood defined by KNN. After the iterative process of shifting, each of the data point converges into a cluster center, and the data points converging into the same center are grouped into the same cluster. The RMS algorithm is tested on two kinds of datasets including community datasets with known ground truth partition and community datasets without ground truth partition respectively. The experiment results show sthat the proposed RMS algorithm generally produces betster results than Medoid-Shift and some state-of-the-art together with most classic community detection algorithms on different kinds of community detection datasets.
翻译:随着社交网络的蓬勃发展,社区检测成为一个重要问题。Medoid-Shift算法保留了Mean-Shift算法的优势,可应用于基于距离矩阵的问题,例如社区检测。该算法的一个缺点是在由距离参数定义的邻域内可能不存在数据点。为了更好地处理社区检测问题,本文提出了一种新算法——修订Medoid-Shift(RMS)。在寻找下一个中心点的过程中,RMS算法基于KNN定义的邻域,而原始Medoid-Shift算法则基于距离参数定义的邻域。由于KNN定义的邻域在数据点数量上比距离参数定义的邻域更稳定,RMS算法可能收敛得更为平滑。在RMS方法中,每个数据点都朝KNN邻域内的中心点移动。经过迭代移动过程,每个数据点收敛到一个聚类中心,收敛到同一中心的数据点被归为同一聚类。分别在两类数据集上测试了RMS算法:一类是具有已知真实划分的社区数据集,另一类是没有真实划分的社区数据集。实验结果表明,在不同类型的社区检测数据集上,所提出的RMS算法总体上优于Medoid-Shift算法以及一些最先进的和最经典的社区检测算法。