Clustering in high-dimensional settings with severe feature noise remains challenging, especially when only a small subset of dimensions is informative and the final number of clusters is not specified in advance. In such regimes, partition recovery, feature relevance learning, and structural adaptation are tightly coupled, and standard likelihood-based methods can become unstable or overly sensitive to noisy dimensions. We propose DIVI, a data-informed variational clustering framework that combines global feature gating with split-based adaptive structure growth. DIVI uses informative prior initialization to stabilize optimization, learns feature relevance in a differentiable manner, and expands model complexity only when local diagnostics indicate underfit. Beyond clustering performance, we also examine runtime scalability and parameter sensitivity in order to clarify the computational and practical behavior of the framework. Empirically, we find that DIVI performs competitively under severe feature noise, remains computationally feasible, and yields interpretable feature-gating behavior, while also exhibiting conservative growth and identifiable failure regimes in challenging settings. Overall, DIVI is best viewed as a practical variational clustering framework for noisy high-dimensional data rather than as a fully Bayesian generative solution.
翻译:在高维且特征噪声严重的场景下进行聚类仍具挑战性,尤其当仅有少量维度包含信息且最终聚类数量未预先指定时尤为困难。在此类情景中,分区恢复、特征相关性学习及结构适应性三者紧密耦合,而基于似然的标准方法可能因噪声维度影响而变得不稳定或过度敏感。我们提出DIVI——一种结合全局特征门控与基于分裂的自适应结构增长的数据驱动变分聚类框架。DIVI通过信息性先验初始化稳定优化过程,以可微分方式学习特征相关性,并仅在局部诊断表明欠拟合时扩展模型复杂度。除聚类性能外,为阐明框架的计算与实践特性,我们还考察了运行时可扩展性与参数敏感性。实验表明,DIVI在严重特征噪声下表现优异,保持计算可行性,并生成可解释的特征门控行为,同时在挑战性场景中展现出保守增长及可识别的失效模式。总体而言,DIVI更宜视为面向含噪高维数据的实用变分聚类框架,而非完整的贝叶斯生成式解决方案。