When predictors are statistically dependent, the appropriate definition of feature importance depends on the operational goal. Conditional-incremental measures are well-suited for feature selection, acquisition, and compression, where shared predictive information is treated as redundancy. For post-hoc interpretation, however, the goal is often to attribute predictive signals across correlated measurement channels. We introduce Disentangled Feature Importance (DFI), a population-level attribution framework for this setting. DFI maps covariates to an independent latent representation under a specified entropic optimal transport geometry, computes latent importance, and attributes it back to the original covariates through barycentric sensitivities. We show that broad conditional-incremental FI functionals target conditional incremental predictive value under squared-error loss, and therefore answer a different question from attribution of shared predictive signal under dependence. Under fixed transport cost, reference law, and regularization level, DFI defines a well-specified family of estimands. Latent scores admit a functional ANOVA interpretation, and in the Gaussian linear case, the attributed DFI recovers the classical $R^2$ decomposition for correlated regressors. We derive influence-function-based inference under nuisance-rate and smoothness conditions, and show in simulations and an HIV-1 neutralization-resistance analysis that DFI yields stable, interpretable, uncertainty-quantified attributions of shared predictive signal.
翻译:当预测变量存在统计依赖时,特征重要性的恰当定义取决于操作目标。条件增量测度适用于特征选择、获取和压缩,其中共享预测信息被视为冗余。然而,对于事后解释,目标通常是在相关测量通道间归因预测信号。我们针对此场景提出一种总体层面归因框架——解缠特征重要性(DFI)。DFI在指定熵最优传输几何下将协变量映射至独立潜在表示,计算潜在重要性,并通过重心灵敏度将其归因回原始协变量。我们证明,在平方误差损失下,广义条件增量FI泛函刻画的正是条件增量预测值,因此其回答的问题与依赖条件下共享预测信号归因不同。在固定传输成本、参考定律和正则化水平下,DFI定义了一个良定义的估计量族。潜在得分可进行泛函ANOVA解释;在高斯线性情形下,归因所得DFI恢复相关回归变量的经典$R^2$分解。我们在干扰率和平滑性条件下推导出基于影响函数的推断,并通过模拟实验和HIV-1中和耐药性分析表明,DFI能对共享预测信号提供稳定、可解释且带不确定性量化的归因结果。