Pairwise dependence measures such as correlation and causality are fundamental to temporal data mining, yet there is still no principled and robust way to quantify dependence between heterogeneous data types, especially between continuous time series and discrete temporal event sequences. Existing approaches rely on ad hoc transformations or mutual-information estimators that are highly sensitive to quantization, repeated values, and event redundancy, leading to biased or unstable results in practice. We propose a nonparametric mutual information estimator that directly measures the dependence between time series and event sequences without data transformation, learning, or ad hoc discretization. Our method models the continuous-discrete duality of real-world time series to handle quantization and repeated-value artifacts and introduces a latent event clustering strategy to mitigate bias from event co-occurrence and redundancy. Together, these yield a robust and unified framework that bridges discrete and continuous mutual information. We evaluate the proposed estimator on four representative tasks: discrete-continuous time-delayed mutual information for causality analysis, global and local temporal repetition discovery, discrete covariate selection for time series forecasting, and continuous feature selection for classification. Experiments on synthetic and real-world datasets show consistent improvements over existing methods in accuracy, robustness, and interpretability, positioning our approach as a general-purpose dependence operator for heterogeneous temporal data, similar to Pearson correlation for homogeneous time series. Code available at: https://github.com/HaojiHu/Multimodal-Temporal-Data-Quantification
翻译:成对依赖性度量(如相关性与因果性)是时间数据挖掘的基础,但针对异质数据类型(尤其是连续时间序列与离散时间事件序列)之间依赖关系的量化,目前仍缺乏原理严谨且鲁棒的方法。现有方法依赖于特设的数据变换或互信息估计器,这些估计器对量化误差、重复值及事件冗余高度敏感,导致实际应用中结果存在偏差或不稳定。本文提出一种非参数互信息估计器,无需数据变换、学习或特设离散化处理,直接度量时间序列与事件序列之间的依赖性。该方法通过建模真实世界时间序列中连续-离散的二重性来处理量化误差与重复值伪影,并引入潜在事件聚类策略以减轻事件共现与冗余造成的偏差。两者相结合,形成了连接离散与连续互信息的鲁棒统一框架。我们在四项典型任务上评估所提出的估计器:因果分析中的离散-连续时滞互信息、全局与局部时间重复模式发现、时间序列预测中的离散协变量选择,以及分类中的连续特征选择。在合成数据集与真实数据集上的实验表明,该方法在准确性、鲁棒性和可解释性方面均持续优于现有方法,使其成为异质时间数据的通用依赖性算子——类似于皮尔逊相关系数在齐次时间序列中的角色。代码开源地址:https://github.com/HaojiHu/Multimodal-Temporal-Data-Quantification