A growing awareness of multi-view learning as an important component in data science and machine learning is a consequence of the increasing prevalence of multiple views in real-world applications, especially in the context of networks. In this paper we introduce a new scalability framework for multi-view subspace clustering. An efficient optimization strategy is proposed, leveraging kernel feature maps to reduce the computational burden while maintaining good clustering performance. The scalability of the algorithm means that it can be applied to large-scale datasets, including those with millions of data points, using a standard machine, in a few minutes. We conduct extensive experiments on real-world benchmark networks of various sizes in order to evaluate the performance of our algorithm against state-of-the-art multi-view subspace clustering methods and attributed-network multi-view approaches.
翻译:随着多视角数据在现实应用(尤其是网络场景)中日益普及,多视图学习作为数据科学与机器学习的重要组成部分引起广泛关注。本文提出了一种面向多视图子空间聚类的新型可扩展框架。通过引入核特征映射技术,我们设计了一种高效的优化策略,在保持优异聚类性能的同时显著降低计算负担。该算法的可扩展性使其能够在标准计算机上,于数分钟内处理包含百万级数据点的大规模数据集。为验证算法性能,我们在不同规模的基准网络数据集上进行了大量实验,并与当前最先进的多视图子空间聚类方法及属性网络多视图方法进行了比较。