State-space models (SSMs) are a powerful statistical tool for modelling time-varying systems via a latent state. In these models, the latent state is never directly observed. Instead, a sequence of data points related to the state are obtained. The linear-Gaussian state-space model is widely used, since it allows for exact inference when all model parameters are known, however this is rarely the case. The estimation of these parameters is a very challenging but essential task to perform inference and prediction. In the linear-Gaussian model, the state dynamics are described via a state transition matrix. This model parameter is known to behard to estimate, since it encodes the relationships between the state elements, which are never observed. In many applications, this transition matrix is sparse since not all state components directly affect all other state components. However, most parameter estimation methods do not exploit this feature. In this work we propose SpaRJ, a fully probabilistic Bayesian approach that obtains sparse samples from the posterior distribution of the transition matrix. Our method explores sparsity by traversing a set of models that exhibit differing sparsity patterns in the transition matrix. Moreover, we also design new effective rules to explore transition matrices within the same level of sparsity. This novel methodology has strong theoretical guarantees, and unveils the latent structure of the data generating process, thereby enhancing interpretability. The performance of SpaRJ is showcased in example with dimension 144 in the parameter space, and in a numerical example with real data.
翻译:状态空间模型(SSM)是一种通过潜状态对时变系统建模的强大统计工具。在该类模型中,潜状态从未被直接观测到,而是获得一系列与状态相关的数据点。线性-高斯状态空间模型因其能够在所有模型参数已知时进行精确推断而被广泛使用,然而实际情况中参数鲜有完全已知的情况。这些参数的估计是一项极具挑战性但对推断和预测至关重要的任务。在线性-高斯模型中,状态动力学通过状态转移矩阵描述。该模型参数因编码了从未被观测的状态元素间的关联关系而难以估计。在许多应用中,由于并非所有状态分量都直接影响其他状态分量,转移矩阵具有稀疏性。然而,大多数参数估计方法并未利用这一特性。本文提出SpaRJ,一种完全概率化的贝叶斯方法,能够从转移矩阵的后验分布中获得稀疏样本。我们的方法通过遍历一组在转移矩阵中呈现不同稀疏模式的模型来探索稀疏性,同时还设计了新的有效规则来探索相同稀疏度水平内的转移矩阵。这一新颖方法具有强理论保证,并能揭示数据生成过程的潜在结构,从而增强可解释性。SpaRJ的性能通过参数空间维度为144的算例及真实数据的数值示例进行了展示。