Pairwise measures of dependence are a common tool to map data in the early stages of analysis with several modern examples based on maximized partitions of the pairwise sample space. Following a short survey of modern measures of dependence, we introduce a new measure which recursively splits the ranks of a pair of variables to partition the sample space and computes the $\chi^2$ statistic on the resulting bins. Splitting logic is detailed for splits maximizing a score function and randomly selected splits. Simulations indicate that random splitting produces a statistic conservatively approximated by the $\chi^2$ distribution without a loss of power to detect numerous different data patterns compared to maximized binning. Though it seems to add no power to detect dependence, maximized recursive binning is shown to produce a natural visualization of the data and the measure. Applying maximized recursive rank binning to S&P 500 constituent data suggests the automatic detection of tail dependence.
翻译:成对依赖度量是数据分析早期阶段常用的数据映射工具,现代方法中若干典型示例基于成对样本空间的最大化划分。在简要综述现代依赖度量方法后,本文提出一种新型度量方法:通过递归分割成对变量的秩次来划分样本空间,并在所得分箱上计算χ²统计量。详细阐述了基于评分函数最大化分割与随机分割的逻辑。模拟实验表明,随机分割产生的统计量可保守近似服从χ²分布,且相较于最大化分箱法,在检测多种数据模式时未出现统计功效损失。尽管最大化递归分箱似乎并未增强依赖检测功效,但证明能自然呈现数据与度量的可视化结果。将最大化递归秩次分箱应用于标普500成分股数据,揭示了尾依赖性的自动检测能力。