We study the problem of matching correlated VAR time series databases, where a multivariate time series is observed along with a perturbed and permuted version, and the goal is to recover the unknown matching between them. To model this, we introduce a probabilistic framework in which two time series $(x_t)_{t\in[T]},(x^\#_t)_{t\in[T]}$ are jointly generated, such that $x^\#_t=x_{π^*(t)}+σ\tilde{x}_{π^*(t)}$, where $(x_t)_{t\in[T]},(\tilde{x}_t)_{t\in[T]}$ are independent and identically distributed vector autoregressive (VAR) time series of order $1$ with Gaussian increments, for a hidden $π^*$. The objective is to recover $π^*$, from the observation of $(x_t)_{t\in[T]},(x^\#_t)_{t\in[T]}$. This generalizes the classical problem of matching independent point clouds to the time series setting. We derive the maximum likelihood estimator (MLE), leading to a quadratic optimization over permutations, and theoretically analyze an estimator based on linear assignment. For the latter approach, we establish recovery guarantees, identifying thresholds for $σ$ that allow for perfect or partial recovery. Additionally, we propose solving the MLE by considering convex relaxations of the set of permutation matrices (e.g., over the Birkhoff polytope). This allows for efficient estimation of $π^*$ and the VAR parameters via alternating minimization. Empirically, we find that linear assignment often matches or outperforms MLE relaxation based approaches.
翻译:我们研究了匹配相关VAR时间序列数据库的问题,其中多变量时间序列与其扰动和置换后的版本同时被观测,目标是恢复两者之间的未知匹配。为建模这一问题,我们引入了一个概率框架,其中两个时间序列$(x_t)_{t\in[T]},(x^\#_t)_{t\in[T]}$联合生成,满足$x^\#_t=x_{π^*(t)}+σ\tilde{x}_{π^*(t)}$,其中$(x_t)_{t\in[T]}$与$(\tilde{x}_t)_{t\in[T]}$是独立同分布的一阶向量自回归(VAR)时间序列,具有高斯增量,且$π^*$是隐藏的。目标是根据$(x_t)_{t\in[T]}$和$(x^\#_t)_{t\in[T]}$的观测值恢复$π^*$。这将军经典独立点云匹配问题推广到了时间序列场景。我们推导了最大似然估计(MLE),将其转化为排列上的二次优化问题,并基于线性分配对估计量进行理论分析。针对后一种方法,我们建立了恢复保证,识别出允许完全或部分恢复的阈值σ。此外,我们提出通过考虑排列矩阵凸松弛(例如在Birkhoff多面体上)来求解MLE,从而通过交替最小化高效估计$π^*$和VAR参数。实验表明,线性分配的表现通常与基于MLE松弛的方法相当或更优。