We investigate the utility of augmenting a microprocessor with a single execution pipeline by adding a second copy of the execution pipeline in parallel with the existing one. The resulting dual-hardware-threaded microprocessor has two identical, independent, single-issue in-order execution pipelines (hardware threads) which share a common memory sub-system (consisting of instruction and data caches together with a memory management unit). From a design perspective, the assembly and verification of the dual threaded processor is simplified by the use of existing verified implementations of the execution pipeline and a memory unit. Because the memory unit is shared by the two hardware threads, the relative area overhead of adding the second hardware thread is 25\% of the area of the existing single threaded processor. Using an FPGA implementation we evaluate the performance of the dual threaded processor relative to the single threaded one. On applications which can be parallelized, we observe speedups of 1.6X to 1.88X. For applications that are not parallelizable, the speedup is more modest. We also observe that the dual threaded processor performance is degraded on applications which generate large numbers of cache misses.
翻译:我们研究了在现有单执行流水线微处理器基础上,通过并行添加第二个相同执行流水线来增强其效用的方法。由此产生的双硬件线程微处理器拥有两个完全相同、独立、单发射顺序执行的流水线(硬件线程),它们共享一个公共内存子系统(由指令缓存、数据缓存及内存管理单元组成)。从设计角度来看,通过重复使用现有已验证的执行流水线和内存单元实现,双线程处理器的组装与验证得以简化。由于内存单元由两个硬件线程共享,添加第二个硬件线程的相对面积开销仅为现有单线程处理器面积的25%。我们利用FPGA实现评估了双线程处理器相对于单线程处理器的性能。在可并行的应用中,观察到1.6倍至1.88倍的加速比;对于不可并行的应用,加速效果较为有限。此外,我们发现双线程处理器在处理产生大量缓存缺失的应用时会出现性能下降。