The conditional mutual information quantifies the conditional dependence of two random variables. It has numerous applications; it forms, for example, part of the definition of transfer entropy, a common measure of the causal relationship between time series. It does, however, require a lot of data to estimate accurately and suffers the curse of dimensionality, limiting its application in machine learning and data science. However, the Kozachenko-Leonenko approach can address this problem: it is possible, in this approach to define a nearest-neighbour estimator which depends only on the distance between data points and not on the dimension of the data. Furthermore, the bias can be calculated analytically for this estimator. Here this estimator is described and is tested on simulated data.
翻译:条件互信息量化了两个随机变量的条件依赖性,应用广泛:例如,它构成传递熵定义的一部分,而传递熵是度量时间序列间因果关系的常用指标。然而,准确估计条件互信息需要大量数据,且受维数灾难影响,这限制了其在机器学习与数据科学中的应用。但Kozachenko-Leonenko方法可解决此问题:在该方法框架下,可以定义一种仅依赖于数据点间距离而非数据维度的最邻近估计器。此外,该估计器的偏差可通过解析方式计算。本文对此估计器进行了描述,并在模拟数据上进行了测试。