Multivariate sequential data collected in practice often exhibit temporal irregularities, including nonuniform time intervals and component misalignment. However, if uneven spacing and asynchrony are endogenous characteristics of the data rather than a result of insufficient observation, the information content of these irregularities plays a defining role in characterizing the multivariate dependence structure. Existing approaches for probabilistic forecasting either overlook the resulting statistical heterogeneities, are susceptible to imputation biases, or impose parametric assumptions on the data distribution. This paper proposes an end-to-end solution that overcomes these limitations by allowing the observation arrival times to play the central role of model construction, which is at the core of temporal irregularities. To acknowledge temporal irregularities, we first enable unique hidden states for components so that the arrival times can dictate when, how, and which hidden states to update. We then develop a conditional flow representation to non-parametrically represent the data distribution, which is typically non-Gaussian, and supervise this representation by carefully factorizing the log-likelihood objective to select conditional information that facilitates capturing time variation and path dependency. The broad applicability and superiority of the proposed solution are confirmed by comparing it with existing approaches through ablation studies and testing on real-world datasets.
翻译:实际采集的多元序列数据常呈现时间不规则性,包括非均匀时间间隔与组件错位。然而,若不等间距与异步性属于数据内生特征而非观测不足所致,这些不规则性所蕴含的信息对刻画多元依赖结构具有决定性作用。现有概率预测方法或忽略由此产生的统计异质性,或易受插值偏差影响,或对数据分布施加参数化假设。本文提出端到端解决方案,通过让观测到达时间在模型构建中发挥核心作用(即时间不规则性的本质)来克服上述局限。为表征时间不规则性,我们首先为各组件设置独立隐状态,使到达时间能够决定何时、如何以及更新哪些隐状态;继而开发条件流表示以非参数化方式刻画通常非高斯的真实数据分布,并通过精细分解对数似然目标来监督该表示,从而筛选出有利于捕获时变性与路径依赖性的条件信息。通过消融实验及真实数据集测试,与现有方法对比验证了所提方案的广泛适用性与优越性。