In Near Memory Processing (NMP), processing elements(PEs) are placed near the 3D memory, reducing unnecessary data transfers between the CPU and the memory. However, as the CPUs and the PEs of the NMP use a shared memory space, maintaining coherency between them is a challenge. Most current literature relies on maintaining coherence for fine-grained or coarse-grained instruction granularities for the offloaded code blocks. We understand that for most NMP-offloaded instructions, the coherence conflict is low, and waiting for the coherence transaction hinders the performance. We construct an analytical model for an existing coherence strategy called CONDA, which is within 4% accuracy. This model indicates the key parameters responsible - the granularity of offloaded code, probability of conflicts, transaction times, and commit time. This paper identifies the prospective optimizations using the analytical model for CONDA. It proposes a new coherence scheme called MRCN: Monitored Rollback Coherence for NMP. MRCN addresses the coherence issue while eliminating unnecessary re-executions with limited hardware overhead. The MRCN is evaluated on synthetic as well as Rodinia benchmarks. The analytical results are within 4% accuracy of the simulation results. The MRCN shows improvement of upto 25% over CONDA strategy for the same benchmark under different execution conditions.
翻译:在近内存处理(NMP)中,处理单元(PE)被置于3D内存附近,从而减少了CPU与内存之间不必要的数据传输。然而,由于CPU和NMP的PE共享内存空间,维持它们之间的连贯性成为一项挑战。现有文献大多依赖于针对卸载代码块的细粒度或粗粒度指令粒度来维持连贯性。我们认识到,大多数NMP卸载指令的连贯性冲突较低,且等待连贯性事务处理会阻碍性能提升。我们为一种名为CONDA的现有连贯性策略构建了分析模型,其精度误差在4%以内。该模型揭示了关键影响因素:卸载代码的粒度、冲突概率、事务处理时间以及提交时间。本文利用针对CONDA的分析模型识别了潜在的优化方向,并提出了一种新的连贯性方案——MRCN:面向NMP的监控回滚连贯性。MRCN在解决连贯性问题的同时,以有限的硬件开销消除了不必要的重执行。我们在合成基准测试和Rodinia基准测试上对MRCN进行了评估,分析结果与仿真结果的精度误差在4%以内。在不同执行条件下,MRCN在相同基准测试上的性能相比CONDA策略提升了高达25%。