We propose a direct and natural extension of Azadkia & Chatterjee's rank correlation $T$ introduced in [4] to a set of $q \geq 1$ endogenous variables. The approach builds upon converting the original vector-valued problem into a univariate problem and then applying the rank correlation $T$ to it. The novel measure $T^q$ then quantifies the scale-invariant extent of functional dependence of an endogenous vector ${\bf Y} = (Y_1,\dots,Y_q)$ on a number of exogenous variables ${\bf X} = (X_1,\dots,X_p)$, $p\geq1$, characterizes independence of ${\bf X}$ and ${\bf Y}$ as well as perfect dependence of ${\bf Y}$ on ${\bf X}$ and hence fulfills all the desired characteristics of a measure of predictability. Aiming at maximum interpretability, we provide various general invariance and continuity conditions for $T^q$ as well as novel ordering results for conditional distributions, revealing new insights into the nature of $T$. Building upon the graph-based estimator for $T$ in [4], we present a non-parametric estimator for $T^q$ that is strongly consistent in full generality, i.e., without any distributional assumptions. Based on this estimator we develop a model-free and dependence-based feature ranking and forward feature selection of multiple-outcome data, and establish tools for identifying networks between random variables. Real case studies illustrate the main aspects of the developed methodology.
翻译:我们提出了对文献[4]中Azadkia & Chatterjee秩相关$T$的直接且自然的扩展,使其适用于包含$q \geq 1$个内生变量的集合。该方法基于将原始向量值问题转化为单变量问题,进而应用秩相关$T$。新度量$T^q$可量化内生向量${\bf Y} = (Y_1,\dots,Y_q)$对多个外生变量${\bf X} = (X_1,\dots,X_p)$($p\geq1$)的尺度不变函数依赖程度,既刻画了${\bf X}$与${\bf Y}$的独立性,又刻画了${\bf Y}$对${\bf X}$的完美依赖,从而满足可预测性度量的所有期望特征。为实现最大可解释性,我们给出了$T^q$的多种通用不变性与连续性条件,以及条件分布的新排序结果,揭示了$T$的本质新见解。基于文献[4]中$T$的图估计器,我们提出了$T^q$的非参数估计器,该估计器在完全一般性条件下(即无任何分布假设)具有强相合性。基于此估计器,我们开发了针对多输出数据的无模型且基于依赖度的特征排序与前向特征选择方法,并建立了用于识别随机变量间网络的工具。实际案例研究展示了所提方法论的主要方面。