In this paper, we introduce Partial Information Decomposition of Features (PIDF), a new paradigm for simultaneous data interpretability and feature selection. Contrary to traditional methods that assign a single importance value, our approach is based on three metrics per feature: the mutual information shared with the target variable, the feature's contribution to synergistic information, and the amount of this information that is redundant. In particular, we develop a novel procedure based on these three metrics, which reveals not only how features are correlated with the target but also the additional and overlapping information provided by considering them in combination with other features. We extensively evaluate PIDF using both synthetic and real-world data, demonstrating its potential applications and effectiveness, by considering case studies from genetics and neuroscience.
翻译:本文提出了一种新的特征部分信息分解(PIDF)范式,用于同时实现数据可解释性和特征选择。与传统方法为每个特征分配单一重要性值不同,我们的方法基于每个特征的三个度量指标:与目标变量共享的互信息、特征对协同信息的贡献度,以及该信息中冗余部分的数量。特别地,我们基于这三个指标开发了一种新颖的分析流程,该流程不仅能揭示特征与目标变量之间的相关性,还能展现当特征与其他特征组合考虑时所提供的额外信息及重叠信息。我们通过合成数据和真实世界数据对PIDF进行了全面评估,并结合遗传学和神经科学领域的案例研究,论证了其潜在应用价值与有效性。