In this paper, we introduce Partial Information Decomposition of Features (PIDF), a new paradigm for simultaneous data interpretability and feature selection. Contrary to traditional methods that assign a single importance value, our approach is based on three metrics per feature: the mutual information shared with the target variable, the feature's contribution to synergistic information, and the amount of this information that is redundant. In particular, we develop a novel procedure based on these three metrics, which reveals not only how features are correlated with the target but also the additional and overlapping information provided by considering them in combination with other features. We extensively evaluate PIDF using both synthetic and real-world data, demonstrating its potential applications and effectiveness, by considering case studies from genetics and neuroscience.
翻译:本文提出特征偏信息分解(Partial Information Decomposition of Features, PIDF),一种面向数据可解释性与特征选择同步实现的新范式。与仅赋予单一重要性值的传统方法不同,本方法为每个特征建立三项度量指标:与目标变量共享的互信息、特征对协同信息的贡献量、以及该信息中的冗余部分。我们基于这三项指标开发出新型分析流程,不仅能揭示特征与目标的相关性,还能展现特征与其他特征组合时产生的增量信息与重叠信息。通过遗传学与神经科学领域的案例研究,我们利用合成数据与真实数据对PIDF进行了全面评估,验证了其潜在应用价值与有效性。