Simple Yet Effective Selective Imputation for Incomplete Multi-view Clustering

Incomplete Multi-view Clustering (IMC) has emerged as a significant challenge in multi-view learning. A predominant line for IMC is data imputation; however, indiscriminate imputation can result in unreliable content. Recently, researchers have proposed selective imputation methods that use a post-imputation assessment strategy: (1) impute all or some missing values, and (2) evaluate their quality through clustering tasks. We observe that this strategy incurs substantial computational complexity and is heavily dependent on the performance of the clustering model. To address these challenges, we first introduce the concept of pre-imputation assessment. We propose an Implicit Informativeness-based Selective Imputation (SI$^3$) method for incomplete multi-view clustering, which explicitly addresses the trade-off between imputation utility and imputation risk. SI$^3$ evaluates the imputation-relevant informativeness of each missing position in a training-free manner, and selectively imputes data only when sufficient informative support is available. Under a multi-view generative assumption, SI$^3$ further integrates selective imputation into a variational inference framework, enabling uncertainty-aware imputation at the latent distribution level and robust multi-view fusion. Compared with existing selective imputation strategies, SI$^3$ is lightweight, data-driven, and model-agnostic, and can be seamlessly incorporated into existing incomplete multi-view clustering frameworks as a plug-in strategy. Extensive experiments on multiple benchmark datasets demonstrate that SI$^3$ consistently outperforms both imputation-based and imputation-free methods, particularly under challenging unbalanced missing scenarios.

翻译：不完整多视图聚类已成为多视图学习中的一个重要挑战。当前主流方法采用数据填补策略，但无差别的填补可能导致不可靠内容。近期研究者提出选择性填补方法，采用后验评估策略：（1）填补全部或部分缺失值，（2）通过聚类任务评估其质量。我们发现该策略存在显著计算复杂度，且高度依赖聚类模型性能。为解决这些问题，我们首次提出预填补评估概念，并基于隐式信息度量提出选择性填补方法（SI$^3$），该方法显式权衡填补效用与填补风险。SI$^3$以无需训练的方式评估每个缺失位置的填补相关信息量，仅在具备足够信息支持时进行选择性填补。在多视图生成假设下，SI$^3$进一步将选择性填补融入变分推断框架，实现潜在分布层面的不确定性感知填补与鲁棒多视图融合。相较于现有选择性填补策略，SI$^3$具有轻量化、数据驱动和模型无关的特性，可作为插件策略无缝集成到现有不完整多视图聚类框架中。在多个基准数据集上的大量实验表明，SI$^3$始终优于基于填补和无填补方法，尤其在具有挑战性的不平衡缺失场景下表现突出。