Modern data-driven applications expose limitations of von Neumann architectures - extensive data movement, low throughput, and poor energy efficiency. Accelerators improve performance but lack flexibility and require data transfers. Existing compute in- and near-memory solutions mitigate these issues but face usability challenges due to data placement constraints. We propose a novel cache architecture that doubles as a tightly-coupled compute-near-memory coprocessor. Our \riscv cache controller executes custom instructions from the host CPU using vector operations dispatched to near-memory vector processing units within the cache memory subsystem. This architecture abstracts memory synchronization and data mapping from application software while offering software-based \isa extensibility. Our implementation shows $30\times$ to $84\times$ performance improvement when operating on 8-bit data over the same system with a traditional cache when executing a worst-case 32-bit CNN workload, with only $41.3\%$ area overhead.
翻译:现代数据驱动应用暴露了冯·诺依曼架构的局限性——大量数据移动、低吞吐率和较差的能效。加速器虽能提升性能,但缺乏灵活性且需要数据传输。现有的内存内计算与近内存计算方案虽能缓解这些问题,却因数据放置限制而面临可用性挑战。我们提出一种新型缓存架构,其同时可作为紧耦合的近内存计算协处理器。我们的RISC-V缓存控制器执行来自主CPU的自定义指令,这些指令通过向量操作分发至缓存内存子系统内的近内存向量处理单元。该架构对应用软件抽象了内存同步与数据映射,同时提供基于软件的指令集架构可扩展性。我们的实现表明,在执行最坏情况下的32位CNN工作负载时,对8位数据进行操作相比采用传统缓存的相同系统,性能提升了30倍至84倍,而面积开销仅为41.3%。