Triple Modular Redundancy (TMR) has been traditionally used to ensure complete tolerance to a single fault or a faulty processing unit, where the processing unit may be a circuit or a system. However, TMR incurs more than 200% overhead in terms of area and power compared to a single processing unit. Hence, alternative redundancy approaches were proposed in the literature to mitigate the design overheads associated with TMR, but they provide only partial or moderate fault tolerance. This research presents a new fault-tolerant design approach based on approximate computing called FAC that has the same fault tolerance as TMR and achieves significant reductions in the design metrics for physical implementation. FAC is suited for a plethora of error-tolerant applications. Here, the performance of TMR and FAC has been evaluated for a digital image processing application. The image processing results obtained confirm the usefulness of FAC. When an example processing unit was implemented using a 28-nm CMOS technology, FAC achieved a 15.3% reduction in delay, a 19.5% reduction in area, and a 24.7% reduction in power compared to TMR.
翻译:三重模块冗余(TMR)传统上被用于确保对单一故障或故障处理单元(该单元可为电路或系统)的完全容错能力。然而,与单个处理单元相比,TMR在面积和功耗方面会产生超过200%的开销。因此,文献中提出了替代性冗余方法以缓解TMR相关的设计开销,但这类方法仅能提供部分或中等程度的容错能力。本研究提出一种基于近似计算的新型容错设计方法FAC,该方法具有与TMR相同的容错能力,同时在物理实现的设计指标上实现显著缩减。FAC适用于大量容错应用场景。本文针对数字图像处理应用评估了TMR与FAC的性能,获得的图像处理结果证实了FAC的有效性。当采用28纳米CMOS工艺实现示例处理单元时,与TMR相比,FAC在延迟上降低15.3%,面积缩减19.5%,功耗减少24.7%。