We introduce a novel measure of dependence that captures the extent to which a random variable $Y$ is determined by a random vector $X$. The measure equals zero precisely when $Y$ and $X$ are independent, and it attains one exactly when $Y$ is almost surely a measurable function of $X$. We further extend this framework to define a measure of conditional dependence between $Y$ and $X$ given $Z$. We propose a simple and interpretable estimator with computational complexity comparable to classical correlation coefficients, including those of Pearson, Spearman, and Chatterjee. Leveraging this dependence measure, we develop a tuning-free, model-agnostic variable selection procedure and establish its consistency under appropriate sparsity conditions. Extensive experiments on synthetic and real datasets highlight the strong empirical performance of our methodology and demonstrate substantial gains over existing approaches.
翻译:我们引入了一种新颖的依赖性度量,用于刻画随机变量$Y$在多大程度上由随机向量$X$所决定。该度量在$Y$与$X$独立时恰好为零,而在$Y$几乎必然为$X$的可测函数时恰好达到一。我们进一步扩展此框架,以定义给定$Z$时$Y$与$X$之间的条件依赖性度量。我们提出了一种计算复杂度与经典相关系数(包括Pearson、Spearman和Chatterjee相关系数)相当的简单且可解释的估计量。利用此依赖性度量,我们开发了一种无需调参、与模型无关的变量选择程序,并在适当的稀疏性条件下证明了其相合性。在合成数据集和真实数据集上进行的大量实验突显了我们方法的强大实证性能,并展示了其相对于现有方法的显著优势。