Power and cost constraints in the internet-of-things (IoT) extreme-edge and TinyML domains, coupled with increasing performance requirements, motivate a trend toward heterogeneous architectures. These designs use energy-efficient application-class host processors to coordinate compute-specialized multicore accelerators, amortizing the architectural costs of operating system support and external communication. This brief presents Cheshire, a lightweight and modular 64-bit Linux-capable host platform designed for the seamless plug-in of domain-specific accelerators. It features a unique low-pin-count DRAM interface, a last-level cache configurable as scratchpad memory, and a DMA engine enabling efficient data movement to or from accelerators or DRAM. It also provides numerous optional IO peripherals including UART, SPI, I2C, VGA, and GPIOs. Cheshire's synthesizable RTL description, comprising all of its peripherals and its fully digital DRAM interface, is available free and open-source. We implemented and fabricated Cheshire as a silicon demonstrator called Neo in TSMC's 65nm CMOS technology. At 1.2 V, Neo achieves clock frequencies of up to 325 MHz while not exceeding 300 mW in total power on data-intensive computational workloads. Its RPC DRAM interface consumes only 250 pJ/B and incurs only 3.5 kGE in area for its PHY while attaining a peak transfer rate of 750 MB/s at 200 MHz.
翻译:摘要:物联网极端边缘与TinyML领域的功耗和成本限制,加之日益增长的性能需求,推动了异构架构的发展趋势。这些设计采用高能效的应用级宿主处理器来协调计算专用多核加速器,从而分摊操作系统支持与外部通信的架构成本。本简报介绍Cheshire——一种轻量级、模块化且支持Linux的64位宿主平台,专为无缝集成领域特定加速器而设计。其特色包括:低引脚数DRAM接口、可配置为便签存储器的末级缓存,以及支持加速器或DRAM间高效数据搬运的DMA引擎。此外还提供UART、SPI、I2C、VGA及GPIO等多种可选IO外设。Cheshire的可综合RTL描述(包含所有外设及全数字DRAM接口)已免费开源发布。我们采用台积电65nm CMOS工艺实现并制造了名为Neo的硅验证芯片。在1.2V电压下,Neo对数据密集型计算工作负载可达到最高325MHz时钟频率,总功耗不超过300mW。其RPC DRAM接口能效仅为250 pJ/B,PHY面积开销仅3.5 kGE,并在200MHz时达到750MB/s的峰值传输速率。