Information Theoretically Optimal Sample Complexity of Learning Dynamical Directed Acyclic Graphs

In this article, the optimal sample complexity of learning the underlying interaction/dependencies of a Linear Dynamical System (LDS) over a Directed Acyclic Graph (DAG) is studied. The sample complexity of learning a DAG's structure is well-studied for static systems, where the samples of nodal states are independent and identically distributed (i.i.d.). However, such a study is less explored for DAGs with dynamical systems, where the nodal states are temporally correlated. We call such a DAG underlying an LDS as \emph{dynamical} DAG (DDAG). In particular, we consider a DDAG where the nodal dynamics are driven by unobserved exogenous noise sources that are wide-sense stationary (WSS) in time but are mutually uncorrelated, and have the same {power spectral density (PSD)}. Inspired by the static settings, a metric and an algorithm based on the PSD matrix of the observed time series are proposed to reconstruct the DDAG. The equal noise PSD assumption can be relaxed such that identifiability conditions for DDAG reconstruction are not violated. For the LDS with WSS (sub) Gaussian exogenous noise sources, it is shown that the optimal sample complexity (or length of state trajectory) needed to learn the DDAG is $n=\Theta(q\log(p/q))$, where $p$ is the number of nodes and $q$ is the maximum number of parents per node. To prove the sample complexity upper bound, a concentration bound for the PSD estimation is derived, under two different sampling strategies. A matching min-max lower bound using generalized Fano's inequality also is provided, thus showing the order optimality of the proposed algorithm.

翻译：本文研究了在有向无环图（DAG）上学习线性动态系统（LDS）中潜在相互作用/依赖关系的最优样本复杂度。对于静态系统（其中节点状态的样本独立同分布，i.i.d.），学习DAG结构的样本复杂度已有充分研究。然而，针对节点状态存在时间相关性的动态系统，此类研究尚不充分。我们将这种作为LDS基础的DAG称为动态DAG（DDAG）。具体而言，我们考虑一种DDAG，其中节点动力学由未观测到的外生噪声源驱动，这些噪声源在时间上是广义平稳（WSS）的，但彼此互不相关，且具有相同的功率谱密度（PSD）。受静态场景启发，我们提出了一种基于观测时间序列PSD矩阵的度量标准和重构DDAG的算法。可放宽等噪声PSD假设，使得DDAG重构的可辨识性条件不被违反。对于具有WSS（亚）高斯外生噪声源的LDS，研究表明学习DDAG所需的最优样本复杂度（即状态轨迹长度）为$n=\Theta(q\log(p/q))$，其中$p$为节点数，$q$为每个节点的最大父节点数。为证明样本复杂度的上界，我们在两种不同采样策略下推导了PSD估计的集中界。同时，利用广义Fano不等式给出了匹配的极小极大下界，从而证明了所提算法的阶最优性。