Multivariate sequential data collected in practice often exhibit temporal irregularities, including nonuniform time intervals and component misalignment. However, if uneven spacing and asynchrony are endogenous characteristics of the data rather than a result of insufficient observation, the information content of these irregularities plays a defining role in characterizing the multivariate dependence structure. Existing approaches for probabilistic forecasting either overlook the resulting statistical heterogeneities, are susceptible to imputation biases, or impose parametric assumptions on the data distribution. This paper proposes an end-to-end solution that overcomes these limitations by allowing the observation arrival times to play the central role of model construction, which is at the core of temporal irregularities. To acknowledge temporal irregularities, we first enable unique hidden states for components so that the arrival times can dictate when, how, and which hidden states to update. We then develop a conditional flow representation to non-parametrically represent the data distribution, which is typically non-Gaussian, and supervise this representation by carefully factorizing the log-likelihood objective to select conditional information that facilitates capturing time variation and path dependency. The broad applicability and superiority of the proposed solution are confirmed by comparing it with existing approaches through ablation studies and testing on real-world datasets.
翻译:多变量时序数据在实践中常呈现时间不规则性,包括非均匀时间间隔与成分失准。若这种非均匀间隔与异步性源自数据内生特征而非观测不足,则这些不规则性所蕴含的信息在刻画多元依赖结构中具有决定性作用。现有概率预测方法或忽视由此产生的统计异质性,或易受插值偏差影响,或对数据分布施加参数化假设。本文提出端到端解决方案,通过使观测到达时间成为模型构建的核心要素(这正是时间不规则性的本质),突破上述局限。为表征时间不规则性,我们首先为各成分赋予独立隐状态,使到达时间能决定何时、如何及更新哪些隐状态。进而构建条件流表示对数据分布(通常非高斯)进行非参数建模,并通过精心分解对数似然目标函数以筛选有助于捕捉时变性与路径依赖的条件信息来监督该表示。通过消融实验与真实数据集测试,与现有方法对比验证了所提方案的广泛适用性与优越性。