Predictability Enables Parallelization of Nonlinear State Space Models

The rise of parallel computing hardware has made it increasingly important to understand which nonlinear state space models can be efficiently parallelized. Recent advances like DEER (arXiv:2309.12252) and DeepPCR (arXiv:2309.16318) recast sequential evaluation as a parallelizable optimization problem, sometimes yielding dramatic speedups. However, the factors governing the difficulty of these optimization problems remained unclear, limiting broader adoption. In this work, we establish a precise relationship between a system's dynamics and the conditioning of its corresponding optimization problem, as measured by its Polyak-Lojasiewicz (PL) constant. We show that the predictability of a system, defined as the degree to which small perturbations in state influence future behavior and quantified by the largest Lyapunov exponent (LLE), impacts the number of optimization steps required for evaluation. For predictable systems, the state trajectory can be computed in at worst $O((\log T)^2)$ time, where $T$ is the sequence length: a major improvement over the conventional sequential approach. In contrast, chaotic or unpredictable systems exhibit poor conditioning, with the consequence that parallel evaluation converges too slowly to be useful. Importantly, our theoretical analysis shows that predictable systems always yield well-conditioned optimization problems, whereas unpredictable systems lead to severe conditioning degradation. We validate our claims through extensive experiments, providing practical guidance on when nonlinear dynamical systems can be efficiently parallelized. We highlight predictability as a key design principle for parallelizable models.

翻译：并行计算硬件的兴起使得理解哪些非线性状态空间模型能够被高效并行化变得日益重要。诸如DEER (arXiv:2309.12252) 和 DeepPCR (arXiv:2309.16318) 等近期进展将顺序评估重新表述为一个可并行化的优化问题，有时能带来显著的加速效果。然而，决定这些优化问题难度的因素尚不明确，限制了其更广泛的应用。在本工作中，我们建立了一个系统的动力学与其对应优化问题条件数之间的精确关系，该关系通过其Polyak-Lojasiewicz (PL) 常数来度量。我们证明，一个系统的可预测性——定义为状态微小扰动对未来行为的影响程度，并由最大李雅普诺夫指数 (LLE) 量化——会影响评估所需的优化步数。对于可预测的系统，状态轨迹的计算在最坏情况下也只需 $O((\log T)^2)$ 的时间，其中 $T$ 是序列长度：这相对于传统的顺序方法是一个重大改进。相反，混沌或不可预测的系统则表现出较差的条件数，其后果是并行评估收敛过慢而失去实用价值。重要的是，我们的理论分析表明，可预测的系统总是产生条件数良好的优化问题，而不可预测的系统则会导致条件数严重恶化。我们通过大量实验验证了我们的论断，为非线性动力系统何时能够被高效并行化提供了实用指导。我们强调可预测性是设计可并行化模型的一个关键原则。