Probabilistic models (PMs) are essential in advancing machine learning capabilities, particularly in safety-critical applications involving reasoning and decision-making. Among the methods employed for inference in these models, sampling-based Markov Chain Monte Carlo (MCMC) techniques are widely used. However, MCMC methods come with significant computational costs and are inherently challenging to parallelize, resulting in inefficient execution on conventional CPU/GPU platforms. To overcome these challenges, this paper presents AIA, a multi-core RISC-V System-on-Chip (SoC) design fabricated using Intel's 16 nm process technology. Our Approximate Inference Accelerator (AIA) is specifically designed to empower edge devices with robust decision-making and reasoning abilities. The AIA architecture incorporates a RISC-V host processor to manage chip-to-chip data communication and a 2D mesh of 16 custom versatile RISC-V cores optimized for high-efficiency approximate inference. Each core features (i) custom instructions and datapath blocks for non-normalized Knuth-Yao (KY) sampling, as well as for the interpolation of non-linear functions (e.g., logarithmic, exponential), and (ii) direct data access to the register file of each neighboring core, to reduce the data movement costs of frequent data exchanges between nearby cores. To further capitalize on the parallelism potential in MCMC algorithms, we developed a specialized compile chain that enables efficient spatial mapping and scheduling across the cores.
翻译:概率模型在推动机器学习能力发展方面至关重要,尤其在涉及推理与决策的安全关键型应用中。在此类模型推断方法中,基于采样的马尔可夫链蒙特卡洛技术得到广泛应用。然而,MCMC方法存在显著计算开销且天然难以并行化,导致其在传统CPU/GPU平台上执行效率低下。为应对这些挑战,本文提出AIA——一款采用英特尔16纳米工艺制造的多核RISC-V系统级芯片设计。我们的近似推理加速器专为边缘设备赋予强大的决策与推理能力而设计。AIA架构包含一个用于管理芯片间数据通信的RISC-V主机处理器,以及一个由16个定制化通用RISC-V内核组成的二维网格,这些内核针对高效近似推理进行了优化。每个内核具有:(i) 用于非归一化Knuth-Yao采样及非线性函数插值的自定义指令与数据通路模块,及(ii) 对相邻内核寄存器文件的直接数据访问能力,以减少邻近内核间频繁数据交换的数据搬运成本。为充分利用MCMC算法中的并行潜力,我们开发了专用编译链,实现了跨内核的高效空间映射与调度。