Obeying constraints imposed by classical physics, we give optimal fine-grained algorithms for matrix multiplication and problems involving graphs and mazes, where all calculations are done in 3-dimensional space. We assume that whatever the technology is, a bit requires a minimum volume and communication travels at a bounded speed. These imply that multiplying $n \times n$ matrices takes $\Omega(n^{2/3})$ time, and we show that this can be achieved by a fine-grained 3-d mesh of $n^2$ processors. While the constants are impractically large, this is asymptotically faster than parallel implementations of Strassen's algorithm, while the lower bound shows that some claims about parallelizing faster serial algorithms are impossible in 3-space. If the matrices are not over a ring then multiplication can be done in $\Theta(n^{3/4})$ time by expanding to a mesh larger than the input. In 2-d (such as the surface of a chip) this approach is useless and $\Theta(n)$ systolic algorithms are optimal even when the matrices are over a ring. Similarly, for path and maze problems there are approaches useful in 3-d but not 2-d.
翻译:在遵循经典物理学约束的前提下,我们针对矩阵乘法及涉及图与迷宫的问题,提出了在三维空间中实现计算的最优细粒度算法。我们假设无论采用何种技术,单个比特需占据最小体积且通信传输速度存在上限。这些约束意味着$n \times n$矩阵相乘至少需要$\Omega(n^{2/3})$时间,而我们证明该下界可通过由$n^2$个处理器构成的细粒度三维网格结构达成。尽管实际常数因子过大而缺乏实用性,但该结果在渐近意义上快于Strassen算法的并行实现,同时下界表明:在三维空间中,关于并行化更快速串行算法的某些论断是不可实现的。若矩阵不定义在环上,则可通过将计算扩展至大于输入规模的网格,在$\Theta(n^{3/4})$时间内完成乘法运算。在二维空间(如芯片表面)中,此类方法无效,此时即使矩阵定义在环上,$\Theta(n)$时间的脉动阵列算法仍是最优解。类似地,针对路径与迷宫问题,也存在仅适用于三维空间而非二维空间的计算方法。