Feature importance quantification faces a fundamental challenge: when predictors are correlated, standard methods systematically underestimate their contributions. We prove that major existing approaches target identical population functionals under squared-error loss, revealing why they share this correlation-induced bias. To address this limitation, we introduce \emph{Disentangled Feature Importance (DFI)}, a nonparametric generalization of the classical $R^2$ decomposition via optimal transport. DFI transforms correlated features into independent latent variables using a transport map, eliminating correlation distortion. Importance is computed in this disentangled space and attributed back through the transport map's sensitivity. DFI provides a principled decomposition of importance scores that sum to the total predictive variability for latent additive models and to interaction-weighted functional ANOVA variances more generally, under arbitrary feature dependencies. We develop a comprehensive semiparametric theory for DFI. For general transport maps, we establish root-$n$ consistency and asymptotic normality of importance estimators in the latent space, which extends to the original feature space for the Bures-Wasserstein map. Notably, our estimators achieve second-order estimation error, which vanishes if both regression function and transport map estimation errors are $o_{\mathbb{P}}(n^{-1/4})$. By design, DFI avoids the computational burden of repeated submodel refitting and the challenges of conditional covariate distribution estimation, thereby achieving computational efficiency.
翻译:特征重要性量化面临一个根本性挑战:当预测变量相关时,标准方法会系统性地低估其贡献。我们证明了在平方误差损失下,现有主要方法针对的是相同的总体泛函,这揭示了它们为何共享这种相关性诱导的偏差。为克服这一局限,我们提出了**解耦特征重要性**——一种通过最优传输对经典$R^2$分解进行的非参数推广。DFI利用传输映射将相关特征转换为独立的潜变量,从而消除相关性失真。重要性在解耦空间中计算,并通过传输映射的敏感性归因回原始特征。对于潜变量可加模型,DFI提供了重要性得分的原理性分解,其总和等于总预测变异性;更一般地,在任意特征依赖关系下,其总和等于交互加权函数方差分析方差。我们为DFI建立了全面的半参数理论。对于一般传输映射,我们证明了潜空间中重要性估计量的$\sqrt{n}$相合性与渐近正态性,该性质对Bures-Wasserstein映射可扩展至原始特征空间。值得注意的是,我们的估计量实现了二阶估计误差,若回归函数与传输映射的估计误差均为$o_{\mathbb{P}}(n^{-1/4})$,则该误差消失。通过设计,DFI避免了重复子模型重拟合的计算负担与条件协变量分布估计的挑战,从而实现了计算效率。