Wireless baseband processing (WBP) is a key element of wireless communications, with a series of signal processing modules to improve data throughput and counter channel fading. Conventional hardware solutions, such as digital signal processors (DSPs) and more recently, graphic processing units (GPUs), provide various degrees of parallelism, yet they both fail to take into account the cyclical and consecutive character of WBP. Furthermore, the large amount of data in WBPs cannot be processed quickly in symmetric multiprocessors (SMPs) due to the unpredictability of memory latency. To address this issue, we propose a hierarchical dataflow-driven architecture to accelerate WBP. A pack-and-ship approach is presented under a non-uniform memory access (NUMA) architecture to allow the subordinate tiles to operate in a bundled access and execute manner. We also propose a multi-level dataflow model and the related scheduling scheme to manage and allocate the heterogeneous hardware resources. Experiment results demonstrate that our prototype achieves $2\times$ and $2.3\times$ speedup in terms of normalized throughput and single-tile clock cycles compared with GPU and DSP counterparts in several critical WBP benchmarks. Additionally, a link-level throughput of $288$ Mbps can be achieved with a $45$-core configuration.
翻译:无线基带处理(WBP)是无线通信的关键环节,包含一系列信号处理模块,用于提升数据吞吐量并对抗信道衰落。传统硬件方案,如数字信号处理器(DSP)及近年来的图形处理器(GPU),虽提供不同程度的并行性,但均未充分考虑WBP的周期性与连续性特征。此外,由于内存延迟的不可预测性,对称多处理器(SMP)无法快速处理WBP中的海量数据。为解决该问题,我们提出一种层次化数据流驱动架构以加速WBP。在非一致性内存访问(NUMA)架构下,我们提出一种“打包-发送”方法,使从属瓦片能够以捆绑式访问与执行方式工作。同时,我们提出多级数据流模型及其相关调度方案,用于管理并分配异构硬件资源。实验结果表明,在多个关键WBP基准测试中,与GPU和DSP相比,我们的原型在归一化吞吐量和单瓦片时钟周期上分别实现了$2\times$和$2.3\times$的加速。此外,在$45$核配置下,链路级吞吐量可达$288$ Mbps。