Multi-view Clustering (MVC) has achieved significant progress, with many efforts dedicated to learn knowledge from multiple views. However, most existing methods are either not applicable or require additional steps for incomplete multi-view clustering. Such a limitation results in poor-quality clustering performance and poor missing view adaptation. Besides, noise or outliers might significantly degrade the overall clustering performance, which are not handled well by most existing methods. Moreover, category information is required in most existing methods, which severely affects the clustering performance. In this paper, we propose a novel unified framework for incomplete and complete MVC named self-learning symmetric multi-view probabilistic clustering (SLS-MPC). SLS-MPC proposes a novel symmetric multi-view probability estimation and equivalently transforms multi-view pairwise posterior matching probability into composition of each view's individual distribution, which tolerates data missing and might extend to any number of views. Then, SLS-MPC proposes a novel self-learning probability function without any prior knowledge and hyper-parameters to learn each view's individual distribution from the aspect of consistency in single-view, cross-view and multi-view. Next, graph-context-aware refinement with path propagation and co-neighbor propagation is used to refine pairwise probability, which alleviates the impact of noise and outliers. Finally, SLS-MPC proposes a probabilistic clustering algorithm to adjust clustering assignments by maximizing the joint probability iteratively, in which category information is not required. Extensive experiments on multiple benchmarks for incomplete and complete MVC show that SLS-MPC significantly outperforms previous state-of-the-art methods.
翻译:多视图聚类(MVC)已取得显著进展,大量工作致力于从多个视图中学习知识。然而,现有方法大多不适用于不完全多视图聚类,或需要额外步骤处理该问题。这一局限性导致聚类性能低下且缺失视图适应性差。此外,噪声或异常值可能显著降低整体聚类性能,而现有方法对此处理不佳。同时,现有方法大多需要类别信息,严重影响聚类性能。本文提出一种新颖的统一框架用于不完全和完整多视图聚类,即自学习对称多视图概率聚类(SLS-MPC)。SLS-MPC提出新颖的对称多视图概率估计方法,将多视图成对后验匹配概率等价转化为各视图独立分布的复合形式,可容忍数据缺失并扩展至任意视图数量。进而,SLS-MPC提出无需先验知识和超参数的自学习概率函数,从单视图、跨视图及多视图的一致性角度学习各视图独立分布。随后,采用基于路径传播和共邻居传播的图上下文感知精化方法优化成对概率,以缓解噪声和异常值的影响。最终,SLS-MPC提出概率聚类算法,通过迭代最大化联合概率调整聚类分配,该过程无需类别信息。在多个不完全和完整多视图聚类基准上的大量实验表明,SLS-MPC显著优于现有最先进方法。