Image processing algorithms are prime targets for hardware acceleration as they are commonly used in resource- and power-limited applications. Today's image processing accelerator designs make rigid assumptions about the algorithm structures and/or on-chip memory resources. As a result, they either have narrow applicability or result in inefficient designs. This paper presents a compiler framework that automatically generates memory- and power-efficient image processing accelerators. We allow programmers to describe generic image processing algorithms (in a domain specific language) and specify on-chip memory structures available. Our framework then formulates a constrained optimization problem that minimizes on-chip memory usage while maintaining theoretical maximum throughput. The key challenge we address is to analytically express the throughput bottleneck, on-chip memory contention, to enable a lightweight compilation. FPGA prototyping and ASIC synthesis show that, compared to existing approaches, accelerators generated by our framework reduce the on-chip memory usage and/or power consumption by double digits.
翻译:图像处理算法是硬件加速的重点目标,因其广泛应用于资源受限且功耗敏感的场景。现有图像处理加速器设计对算法结构和/或片上存储资源设定严格假设,导致其适用范围受限或产生低效设计。本文提出一种编译器框架,可自动生成内存高效、低功耗的图像处理加速器。我们允许程序员用领域特定语言描述通用图像处理算法,并指定可用片上存储结构。随后,框架构建约束优化问题,在维持理论最大吞吐量的前提下最小化片上存储使用。我们解决的关键挑战在于通过解析方式表达吞吐瓶颈——片上存储冲突,以实现轻量级编译。FPGA原型验证和ASIC综合结果表明,与现有方法相比,本框架生成的加速器可减少两位数百分比(双位数)的片上存储占用和/或功耗。