Efficiency in instruction fetching is critical to performance, and this requires the primary structures--L1 instruction caches (L1i), branch target buffers (BTB) and instruction TLBs (iTLB)--to have the requisite information when needed. This paper proposes instruction presending, which traverses a high-level program map to identify and move instruction cache blocks, BTB entries, and iTLB entries from the secondary to the primary structures in a "just in time" fashion. Empirical results are presented to demonstrate the efficacy of the proposed presending scheme. Presending reduces the number of cycles where the instruction fetch is waiting by an order of magnitude as compared to state-of-the-art instruction prefetching schemes while operating with small-sized primary BTBs. It is especially effective for benchmarks with a high base MPKI, where movement from secondary to primary structures is frequent. This improvement in fetch efficiency results in performance improvements in cases where this efficiency is important.
翻译:暂无翻译