State-space models (SSMs) are a powerful statistical tool for modelling time-varying systems via a latent state. In these models, the latent state is never directly observed. Instead, a sequence of data points related to the state are obtained. The linear-Gaussian state-space model is widely used, since it allows for exact inference when all model parameters are known, however this is rarely the case. The estimation of these parameters is a very challenging but essential task to perform inference and prediction. In the linear-Gaussian model, the state dynamics are described via a state transition matrix. This model parameter is known to behard to estimate, since it encodes the relationships between the state elements, which are never observed. In many applications, this transition matrix is sparse since not all state components directly affect all other state components. However, most parameter estimation methods do not exploit this feature. In this work we propose SpaRJ, a fully probabilistic Bayesian approach that obtains sparse samples from the posterior distribution of the transition matrix. Our method explores sparsity by traversing a set of models that exhibit differing sparsity patterns in the transition matrix. Moreover, we also design new effective rules to explore transition matrices within the same level of sparsity. This novel methodology has strong theoretical guarantees, and unveils the latent structure of the data generating process, thereby enhancing interpretability. The performance of SpaRJ is showcased in example with dimension 144 in the parameter space, and in a numerical example with real data.
翻译:状态空间模型(SSMs)是通过潜在状态对时变系统进行建模的强大统计工具。在此类模型中,潜在状态从未被直接观测,而是获得一系列与状态相关的数据点。线性高斯状态空间模型因其在所有模型参数已知时能实现精确推断而被广泛应用,然而这一情形在实际中很少出现。对这些参数的估计是一项极具挑战性但不可或缺的任务,是实现推断与预测的基础。在线性高斯模型中,状态动力学通过状态转移矩阵描述。该模型参数因编码了从未被观测的状态元素间关系而难以估计。在许多应用中,由于并非所有状态分量都直接影响其他状态分量,该转移矩阵具有稀疏性。然而,多数参数估计方法并未利用这一特性。本研究提出SpaRJ——一种完全概率化的贝叶斯方法,能从转移矩阵的后验分布中获得稀疏样本。该方法通过遍历一组在转移矩阵中呈现不同稀疏模式的模型来探索稀疏性。此外,我们还设计了新的有效规则,用于在同一稀疏度水平内探索转移矩阵。这一新颖方法具有强理论保证,能揭示数据生成过程的潜在结构,从而增强可解释性。SpaRJ的性能通过参数空间维度为144的算例及基于真实数据的数值算例得到展示。