Constable: Improving Performance and Power Efficiency by Safely Eliminating Load Instruction Execution

Rahul Bera,Adithya Ranganathan,Joydeep Rakshit,Sujit Mahto,Anant V. Nori,Jayesh Gaur,Ataberk Olgun,Konstantinos Kanellopoulos,Mohammad Sadrosadati,Sreenivas Subramoney,Onur Mutlu

from arxiv, To appear in the proceedings of 51st International Symposium on Computer Architecture (ISCA)

Load instructions often limit instruction-level parallelism (ILP) in modern processors due to data and resource dependences they cause. Prior techniques like Load Value Prediction (LVP) and Memory Renaming (MRN) mitigate load data dependence by predicting the data value of a load instruction. However, they fail to mitigate load resource dependence as the predicted load instruction gets executed nonetheless. Our goal in this work is to improve ILP by mitigating both load data dependence and resource dependence. To this end, we propose a purely-microarchitectural technique called Constable, that safely eliminates the execution of load instructions. Constable dynamically identifies load instructions that have repeatedly fetched the same data from the same load address. We call such loads likely-stable. For every likely-stable load, Constable (1) tracks modifications to its source architectural registers and memory location via lightweight hardware structures, and (2) eliminates the execution of subsequent instances of the load instruction until there is a write to its source register or a store or snoop request to its load address. Our extensive evaluation using a wide variety of 90 workloads shows that Constable improves performance by 5.1% while reducing the core dynamic power consumption by 3.4% on average over a strong baseline system that implements MRN and other dynamic instruction optimizations (e.g., move and zero elimination, constant and branch folding). In presence of 2-way simultaneous multithreading (SMT), Constable's performance improvement increases to 8.8% over the baseline system. When combined with a state-of-the-art load value predictor (EVES), Constable provides an additional 3.7% and 7.8% average performance benefit over the load value predictor alone, in the baseline system without and with 2-way SMT, respectively.

翻译：在现代处理器中，加载指令因其引起的数据与资源依赖，常常限制指令级并行度。现有技术如加载值预测与内存重命名通过预测加载指令的数据值来缓解数据依赖，但未能解决资源依赖问题，因为预测的加载指令仍需执行。本研究旨在通过同时缓解加载指令的数据依赖与资源依赖来提升指令级并行度。为此，我们提出一种纯微架构技术Constable，可安全消除加载指令的执行。Constable动态识别那些反复从相同加载地址获取相同数据的加载指令，称其为似稳态加载。针对每个似稳态加载，Constable（1）通过轻量级硬件结构追踪其源架构寄存器及内存位置的修改，（2）在源寄存器被写入或加载地址收到存储/侦听请求前，消除该加载指令后续实例的执行。我们在包含90种工作负载的广泛测试中评估表明：在已实现内存重命名及其他动态指令优化（如移动/零操作消除、常量/分支折叠）的强基线系统上，Constable平均提升性能5.1%，同时降低核心动态功耗3.4%。在2路同步多线程环境下，Constable的性能提升增至8.8%。当与先进的加载值预测器EVES结合时，在无/有2路同步多线程的基线系统中，Constable相较单独使用加载值预测器可额外带来平均3.7%与7.8%的性能收益。