Machine Learning approaches like clustering methods deal with massive datasets that present an increasing challenge. We devise parallel algorithms to compute the Multi-Slice Clustering (MSC) for 3rd-order tensors. The MSC method is based on spectral analysis of the tensor slices and works independently on each tensor mode. Such features fit well in the parallel paradigm via a distributed memory system. We show that our parallel scheme outperforms sequential computing and allows for the scalability of the MSC method.
翻译:机器学习方法(如聚类算法)需处理规模日益庞大的数据集,这带来了持续挑战。我们设计了用于计算三阶张量多切片聚类(Multi-Slice Clustering, MSC)的并行算法。MSC方法基于张量切片的谱分析,且每个张量模可独立计算。该特性使其通过分布式内存系统完美适配并行计算范式。实验表明,该并行方案在性能上超越串行计算,并实现了MSC方法的可扩展性。