Bulk-bitwise processing-in-memory (PIM), where large bitwise operations are performed in parallel by the memory array itself, is an emerging form of computation with the potential to mitigate the memory wall problem. This paper examines the capabilities of bulk-bitwise PIM by constructing PIMDB, a fully-digital system based on memristive stateful logic, utilizing and focusing on in-memory bulk-bitwise operations, designed to accelerate a real-life workload: analytical processing of relational databases. We introduce a host processor programming model to support bulk-bitwise PIM in virtual memory, develop techniques to efficiently perform in-memory filtering and aggregation operations, and adapt the application data set into the memory. To understand bulk-bitwise PIM, we compare it to an equivalent in-memory database on the same host system. We show that bulk-bitwise PIM substantially lowers the number of required memory read operations, thus accelerating TPC-H filter operations by 1.6$\times$--18$\times$ and full queries by 56$\times$--608$\times$, while reducing the energy consumption by 1.7$\times$--18.6$\times$ and 0.81$\times$--12$\times$ for these benchmarks, respectively. Our extensive evaluation uses the gem5 full-system simulation environment. The simulations also evaluate cell endurance, showing that the required endurance is within the range of existing endurance of RRAM devices.
翻译:批量按位内存计算(Bulk-bitwise Processing-in-Memory, PIM)是一种新兴计算范式,通过在存储阵列内部并行执行大规模按位操作,具有缓解内存墙问题的潜力。本文通过构建PIMDB——一个基于忆阻有状态逻辑的全数字系统,利用并聚焦于内存批量按位运算,旨在加速真实工作负载:关系数据库的分析处理。我们提出了一种支持虚拟内存中批量按位PIM的主机处理器编程模型,开发了高效执行内存内过滤与聚合操作的技术,并将应用数据集适配至内存。为理解批量按位PIM,我们将其与同一主机系统上的等效内存数据库进行对比。研究表明,批量按位PIM可显著减少所需内存读取操作次数,从而将TPC-H过滤操作加速1.6倍至18倍,完整查询加速56倍至608倍,同时分别将上述基准测试的能耗降低1.7倍至18.6倍和0.81倍至12倍。我们采用gem5全系统仿真环境进行广泛评估,仿真同时评估了单元耐久性,结果表明所需耐久性处于现有RRAM器件耐久性范围之内。