The K-Means clustering using LLoyd's algorithm is an iterative approach to partition the given dataset into K different clusters. The algorithm assigns each point to the cluster based on the following objective function \[\ \min \Sigma_{i=1}^{n}||x_i-\mu_{x_i}||^2\] The serial algorithm involves iterative steps where we compute the distance of each datapoint from the centroids and assign the datapoint to the nearest centroid. This approach is essentially known as the expectation-maximization step. Clustering involves extensive computations to calculate distances at each iteration, which increases as the number of data points increases. This provides scope for parallelism. However, we must ensure that in a parallel process, each thread has access to the updated centroid value and no racing condition exists on any centroid values. We will compare two different approaches in this project. The first approach is an OpenMP flat synchronous method where all processes are run in parallel, and we use synchronization to ensure safe updates of clusters. The second approach we adopt is a GPU based parallelization approach using OpenACC wherein we will try to make use of GPU architecture to parallelize chunks of the algorithm to observe decreased computation time. We will analyze metrics such as speed up, efficiency,time taken with varying data points, and number of processes to compare the two approaches and understand the relative performance improvement we can get.
翻译:采用LLoyd算法的K-Means聚类是一种将给定数据集划分为K个不同簇的迭代方法。该算法基于如下目标函数将每个数据点分配给簇:\[\ \min \Sigma_{i=1}^{n}||x_i-\mu_{x_i}||^2\] 串行算法包含迭代步骤,其中我们计算每个数据点到质心的距离,并将该数据点分配给最近的质心。这一方法本质上被称为期望最大化步骤。聚类涉及大量计算,用于在每次迭代中计算距离,且计算量随数据点数量增加而增加。这为并行化提供了空间。然而,我们必须确保在并行过程中,每个线程都能访问更新的质心值,并且任何质心值上不存在竞争条件。在本项目中,我们将比较两种不同的方法。第一种方法是基于OpenMP的扁平同步方法,其中所有进程并行运行,我们使用同步机制来确保簇的安全更新。第二种方法采用基于GPU的并行化方法,使用OpenACC,我们将尝试利用GPU架构对算法的部分模块进行并行化,以观察计算时间的减少。我们将分析加速比、效率、不同数据点数量及进程数下的耗时等指标,以比较这两种方法,并了解我们可以获得的相对性能提升。