The RISC-V "V" extension introduces vector processing to the RISC-V architecture. Unlike most SIMD extensions, it supports long vectors which can result in significant improvement of multiple applications. In this paper, we present our ongoing research to implement and optimize a vectorized Winograd algorithm used in convolutional layers on RISC-V Vector(RISC-VV) processors. Our study identifies effective techniques for optimizing the kernels of Winograd on RISC-VV using intrinsic instructions, and showcases how certain instructions offer better performance. Our co-design findings suggest that the Winograd algorithm benefits from vector lengths up to 2048 bits and cache sizes up to 64MB. We use our experience with Winograd to highlight potential enhancements for the standard that would simplify code generation and aid low-level programming. Finally, we share our experience from experimenting with forks of gem5 for RISC-VV and stress the importance of a mature software ecosystem, to facilitate design space exploration and architectural optimization.
翻译:RISC-V "V"扩展为RISC-V架构引入了向量处理能力。与多数SIMD扩展不同,它支持长向量计算,可显著提升多种应用的性能。本文展示了我们在RISC-V向量(RISC-VV)处理器上实现并优化卷积层中向量化Winograd算法的持续研究。通过分析,我们确定了利用内联指令在RISC-VV上优化Winograd核函数的有效技术,并揭示了特定指令带来的性能优势。协同设计结果表明,Winograd算法受益于最高2048位的向量长度和64MB的缓存容量。基于Winograd的实践经验,我们指出了标准中可简化代码生成并辅助底层编程的潜在改进方向。最后,我们分享了在gem5分支上针对RISC-VV进行实验的经验,并强调了成熟软件生态系统对促进设计空间探索与架构优化的重要性。