We find a succinct expression for computing the sequence $x_t = a_t x_{t-1} + b_t$ in parallel with two prefix sums, given $t = (1, 2, \dots, n)$, $a_t \in \mathbb{R}^n$, $b_t \in \mathbb{R}^n$, and initial value $x_0 \in \mathbb{R}$. On $n$ parallel processors, the computation of $n$ elements incurs $\mathcal{O}(\log n)$ time and $\mathcal{O}(n)$ space. Sequences of this form are ubiquitous in science and engineering, making efficient parallelization useful for a vast number of applications. We implement our expression in software, test it on parallel hardware, and verify that it executes faster than sequential computation by a factor of $\frac{n}{\log n}$.
翻译:我们发现了计算序列 $x_t = a_t x_{t-1} + b_t$ 的一种简洁表达式,该表达式可通过两个前缀和实现并行计算,其中 $t = (1, 2, \dots, n)$,$a_t \in \mathbb{R}^n$,$b_t \in \mathbb{R}^n$,初始值 $x_0 \in \mathbb{R}$。在 $n$ 个并行处理器上,计算 $n$ 个元素的时间复杂度为 $\mathcal{O}(\log n)$,空间复杂度为 $\mathcal{O}(n)$。此类序列在科学和工程领域普遍存在,因此其高效并行化方法对大量应用具有重要价值。我们将该表达式实现为软件,在并行硬件上进行测试,并验证其执行速度比顺序计算快 $\frac{n}{\log n}$ 倍。