Using matrix-product states for time-series machine learning

Matrix-product states (MPS) have proven to be a versatile ansatz for modeling quantum many-body physics. For many applications, and particularly in one-dimension, they capture relevant quantum correlations in many-body wavefunctions while remaining tractable to store and manipulate on a classical computer. This has motivated researchers to also apply the MPS ansatz to machine learning (ML) problems where capturing complex correlations in datasets is also a key requirement. Here, we develop and apply an MPS-based algorithm, MPSTime, for learning a joint probability distribution underlying an observed time-series dataset, and show how it can be used to tackle important time-series ML problems, including classification and imputation. MPSTime can efficiently learn complicated time-series probability distributions directly from data, requires only moderate maximum MPS bond dimension $\chi_{\rm max}$, with values for our applications ranging between $\chi_{\rm max} = 20-150$, and can be trained for both classification and imputation tasks under a single logarithmic loss function. Using synthetic and publicly available real-world datasets, spanning applications in medicine, energy, and astronomy, we demonstrate performance competitive with state-of-the-art ML approaches, but with the key advantage of encoding the full joint probability distribution learned from the data. By sampling from the joint probability distribution and calculating its conditional entanglement entropy, we show how its underlying structure can be uncovered and interpreted. This manuscript is supplemented with the release of a publicly available code package MPSTime that implements our approach. The efficiency of the MPS-based ansatz for learning complex correlation structures from time-series data is likely to underpin interpretable advances to challenging time-series ML problems across science, industry, and medicine.

翻译：矩阵乘积态（MPS）已被证明是模拟量子多体物理的一种通用变分波函数形式。在许多应用中，尤其在一维情形下，MPS 能够捕捉多体波函数中的关键量子关联，同时其存储与经典计算机操作仍具有可处理性。这促使研究人员也将 MPS 变分形式应用于机器学习（ML）问题，因为在这些问题中捕捉数据集中的复杂关联同样是核心要求。本文中，我们开发并应用了一种基于 MPS 的算法——MPSTime，用于学习观测到的时间序列数据集背后的联合概率分布，并展示了如何利用它来解决重要的时间序列机器学习问题，包括分类与填补。MPSTime 能够直接从数据中高效学习复杂的时间序列概率分布，仅需适中的最大 MPS 键维数 $\chi_{\rm max}$（在我们的应用中其值介于 $\chi_{\rm max} = 20-150$ 之间），并且可以在单一的对数损失函数下同时训练分类与填补任务。通过使用合成数据集以及公开可得的真实世界数据集（涵盖医学、能源和天文学领域的应用），我们展示了其性能可与最先进的机器学习方法相媲美，但具有关键优势：能够编码从数据中学到的完整联合概率分布。通过对联合概率分布进行采样并计算其条件纠缠熵，我们揭示了其底层结构并提供了可解释性。本文附带了公开可用的代码包 MPSTime 的发布，该包实现了我们的方法。基于 MPS 的变分形式从时间序列数据中学习复杂关联结构的效率，有望为科学、工业和医学领域中具有挑战性的时间序列机器学习问题提供可解释的进展。