Clustering mixed-type tabular data is fundamental for exploratory analysis, yet remains challenging due to misaligned numerical-categorical representations, uneven and context-dependent feature relevance, and disconnected and post-hoc explanation from the clustering process. We propose WISE, a Weight-Informed Self-Explaining framework that unifies representation, feature weighting, clustering, and interpretation in a fully unsupervised and transparent pipeline. WISE introduces Binary Encoding with Padding (BEP) to align heterogeneous features in a unified sparse space, a Leave-One-Feature-Out (LOFO) strategy to sense multiple high-quality and diverse feature-weighting views, and a two-stage weight-aware clustering procedure to aggregate alternative semantic partitions. To ensure intrinsic interpretability, we further develop Discriminative FreqItems (DFI), which yields feature-level explanations that are consistent from instances to clusters with an additive decomposition guarantee. Extensive experiments on six real-world datasets demonstrate that WISE consistently outperforms classical and neural baselines in clustering quality while remaining efficient, and produces faithful, human-interpretable explanations grounded in the same primitives that drive clustering.
翻译:对混合型表格数据进行聚类是探索性分析的基础,但由于数值型与类别型表征的不对齐、特征相关性的不均匀与上下文依赖性,以及聚类过程与事后解释的脱节,这一任务仍具挑战性。我们提出WISE(权重感知的自解释框架),该框架将表征、特征加权、聚类与解释统一于完全无监督且透明的管线中。WISE引入带填充的二进制编码(BEP)以在统一的稀疏空间中对齐异构特征,采用留一特征法(LOFO)策略感知多个高质量且多样化的特征加权视角,并通过两阶段权重感知聚类程序聚合替代性语义分区。为确保内在可解释性,我们进一步开发判别性频繁项集(DFI),该方法通过可加性分解保证,生成从实例到集群均一致的、基于特征级别的解释。在六个真实数据集上的大量实验表明,WISE在保持高效性的同时,在聚类质量上持续优于经典及神经基线方法,并能生成基于驱动聚类的相同基元、忠实且可人工理解的解释。