Searches for signals at low signal-to-noise ratios frequently involve the Fast Fourier Transform (FFT). For high-throughput searches, we here consider FFT on the homogeneous mesh of Processing Elements (PEs) of a wafer-scale engine (WSE). To minimize memory overhead in the inherently non-local FFT algorithm, we introduce a new synchronous slide operation ({\em Slide}) exploiting the fast interconnect between adjacent PEs. Feasibility of compute-limited performance is demonstrated in linear scaling of Slide execution times with varying array size in preliminary benchmarks on the CS-2 WSE. The proposed implementation appears opportune to accelerate and open the full discovery potential of FFT-based signal processing in multi-messenger astronomy.
翻译:针对低信噪比信号的搜索通常需要快速傅里叶变换(FFT)。为实现高通量搜索,本文探讨了在晶圆级引擎(WSE)的均匀处理单元(PE)网格上执行FFT的方法。为最小化非局部FFT算法固有的内存开销,我们引入了一种新型同步滑动操作({\em Slide}),该操作利用了相邻PE之间的快速互连。通过在CS-2 WSE上的初步基准测试中展示Slide执行时间随阵列大小线性变化,证明了计算受限性能的可行性。所提出的实现方案有望加速并释放多信使天文学中基于FFT信号处理的全部探索潜力。