Pairwise dependence measures such as correlation and causality are fundamental to temporal data mining, yet there is still no principled and robust way to quantify dependence between heterogeneous data types, especially between continuous time series and discrete temporal event sequences. Existing approaches rely on ad hoc transformations or mutual-information estimators that are highly sensitive to quantization, repeated values, and event redundancy, leading to biased or unstable results in practice. We propose a nonparametric mutual information estimator that directly measures the dependence between time series and event sequences without data transformation, learning, or ad hoc discretization. Our method models the continuous-discrete duality of real-world time series to handle quantization and repeated-value artifacts and introduces a latent event clustering strategy to mitigate bias from event co-occurrence and redundancy. Together, these yield a robust and unified framework that bridges discrete and continuous mutual information. We evaluate the proposed estimator on four representative tasks: discrete-continuous time-delayed mutual information for causality analysis, global and local temporal repetition discovery, discrete covariate selection for time series forecasting, and continuous feature selection for classification. Experiments on synthetic and real-world datasets show consistent improvements over existing methods in accuracy, robustness, and interpretability, positioning our approach as a general-purpose dependence operator for heterogeneous temporal data, similar to Pearson correlation for homogeneous time series. Code available at: https://github.com/HaojiHu/Multimodal-Temporal-Data-Quantification
翻译:成对依赖度量(如相关性和因果性)是时间数据挖掘的基础,但目前仍缺乏原则性且鲁棒的方法来量化异质数据类型之间的依赖关系,尤其是连续时间序列与离散时间事件序列之间的依赖关系。现有方法依赖于临时转换或互信息估计器,这些方法对量化误差、重复值和事件冗余高度敏感,导致实际应用中出现偏差或不稳定的结果。我们提出了一种非参数互信息估计器,可直接测量时间序列与事件序列之间的依赖关系,无需数据转换、学习或临时离散化。该方法通过建模真实世界时间序列的连续-离散双重性来处理量化误差和重复值伪影,并引入隐式事件聚类策略以缓解事件共现和冗余引起的偏差。这些技术共同构建了一个鲁棒且统一的框架,桥接了离散与连续互信息。我们通过四项代表性任务评估所提估计器:用于因果分析的离散-连续时延互信息、全局与局部时间重复模式发现、时间序列预测中的离散协变量选择,以及分类任务中的连续特征选择。在合成与真实数据集上的实验表明,该方法在准确性、鲁棒性和可解释性上相较于现有方法持续改进,使其成为异质时间数据的一种通用依赖算子,其应用广度类似于皮尔逊相关系数之于同质时间序列。代码开源:https://github.com/HaojiHu/Multimodal-Temporal-Data-Quantification