Unsupervised feature selection (UFS) has recently gained attention for its effectiveness in processing unlabeled high-dimensional data. However, existing methods overlook the intrinsic causal mechanisms within the data, resulting in the selection of irrelevant features and poor interpretability. Additionally, previous graph-based methods fail to account for the differing impacts of non-causal and causal features in constructing the similarity graph, which leads to false links in the generated graph. To address these issues, a novel UFS method, called Causally-Aware UnSupErvised Feature Selection learning (CAUSE-FS), is proposed. CAUSE-FS introduces a novel causal regularizer that reweights samples to balance the confounding distribution of each treatment feature. This regularizer is subsequently integrated into a generalized unsupervised spectral regression model to mitigate spurious associations between features and clustering labels, thus achieving causal feature selection. Furthermore, CAUSE-FS employs causality-guided hierarchical clustering to partition features with varying causal contributions into multiple granularities. By integrating similarity graphs learned adaptively at different granularities, CAUSE-FS increases the importance of causal features when constructing the fused similarity graph to capture the reliable local structure of data. Extensive experimental results demonstrate the superiority of CAUSE-FS over state-of-the-art methods, with its interpretability further validated through feature visualization.
翻译:无监督特征选择(UFS)近年来因其在处理无标签高维数据方面的有效性而受到关注。然而,现有方法忽略了数据内在的因果机制,导致选择了不相关的特征且可解释性较差。此外,以往的基于图的方法未能考虑非因果特征与因果特征在构建相似性图时的不同影响,这导致生成图中出现虚假连接。为解决这些问题,本文提出了一种新颖的无监督特征选择方法,称为因果感知的无监督特征选择学习(CAUSE-FS)。CAUSE-FS引入了一种新颖的因果正则化器,通过对样本重新加权来平衡每个处理特征的混杂分布。该正则化器随后被集成到一个广义的无监督谱回归模型中,以减轻特征与聚类标签之间的虚假关联,从而实现因果特征选择。此外,CAUSE-FS采用因果引导的层次聚类,将具有不同因果贡献的特征划分到多个粒度层次。通过整合在不同粒度下自适应学习得到的相似性图,CAUSE-FS在构建融合相似性图时提升了因果特征的重要性,以捕捉数据可靠的局部结构。大量实验结果表明,CAUSE-FS优于现有最先进的方法,其可解释性也通过特征可视化得到了进一步验证。