Real-time systems, particularly those used in domains like automated driving, are increasingly adopting neural networks. From this trend arises the need for high-performance hardware exhibiting predictable timing behavior. While state-of-the-art real-time hardware often suffers from limited memory and compute resources, modern AI accelerators typically lack the crucial predictability due to memory interference. We present a new hardware architecture to bridge this gap between performance and predictability. The architecture features a multi-core vector processor with predictable cores, each equipped with local scratchpad memories. A central management core orchestrates access to shared external memory following a statically determined schedule. To evaluate the proposed hardware architecture, we analyze different variants of our parameterized design. We compare these variants to a baseline architecture consisting of a single-core vector processor with large vector registers. We find that configurations with a larger number of smaller cores achieve better performance due to increased effective memory bandwidth and higher clock frequencies. Crucially for real-time systems, execution time fluctuation remains very low, demonstrating the platform's time predictability.
翻译:实时系统,特别是自动驾驶等领域的实时系统,正越来越多地采用神经网络。这一趋势催生了对具有可预测时序行为的高性能硬件的需求。虽然最先进的实时硬件通常受限于有限的内存和计算资源,但现代AI加速器由于内存干扰通常缺乏关键的可预测性。我们提出了一种新的硬件架构来弥合性能与可预测性之间的差距。该架构采用具有可预测核心的多核向量处理器,每个核心配备本地暂存存储器。一个中央管理核心按照静态确定的调度方案协调对共享外部存储器的访问。为了评估所提出的硬件架构,我们分析了参数化设计的不同变体。我们将这些变体与由具有大型向量寄存器的单核向量处理器组成的基线架构进行了比较。我们发现,由于有效内存带宽增加和时钟频率更高,采用更多数量较小核心的配置能实现更好的性能。对于实时系统至关重要的是,执行时间波动保持在极低水平,这证明了该平台的时间可预测性。