RETENTION: Resource-Efficient Tree-Based Ensemble Model Acceleration with Content-Addressable Memory

Although deep learning has demonstrated remarkable capability in learning from unstructured data, modern tree-based ensemble models remain superior in extracting relevant information and learning from structured datasets. While several efforts have been made to accelerate tree-based models, the inherent characteristics of the models pose significant challenges for conventional accelerators. Recent research leveraging content-addressable memory (CAM) offers a promising solution for accelerating tree-based models, yet existing designs suffer from excessive memory consumption and low utilization. This work addresses these challenges by introducing RETENTION, an end-to-end framework that significantly reduces CAM capacity requirement for tree-based model inference. We propose an iterative pruning algorithm with a novel pruning criterion tailored for bagging-based models (e.g., Random Forest), which minimizes model complexity while ensuring controlled accuracy degradation. Additionally, we present a tree mapping scheme that incorporates two innovative data placement strategies to alleviate the memory redundancy caused by the widespread use of don't care states in CAM. Experimental results show that implementing the tree mapping scheme alone reduces CAM capacity requirement by $1.46\times$ to $21.30 \times$, while the full RETENTION framework achieves $4.35\times$ to $207.12\times$ reduction with less than 3\% accuracy loss. These results demonstrate that RETENTION is highly effective in minimizing CAM resource demand, providing a resource-efficient direction for tree-based model acceleration.

翻译：尽管深度学习在处理非结构化数据方面展现出卓越的学习能力，但现代基于树的集成模型在从结构化数据集中提取相关信息和学习方面仍具优势。虽然已有若干研究致力于加速树模型，但模型的内在特性对传统加速器构成了显著挑战。近期利用内容寻址存储器（CAM）的研究为加速树模型提供了有前景的解决方案，然而现有设计存在内存消耗过高和利用率低下的问题。本研究通过提出RETENTION——一个端到端的框架，显著降低了树模型推理所需的CAM容量需求，从而应对这些挑战。我们提出了一种针对装袋类模型（如随机森林）设计的迭代剪枝算法，该算法采用新颖的剪枝准则，在确保精度损失可控的前提下最小化模型复杂度。此外，我们提出了一种树映射方案，该方案融合了两种创新的数据布局策略，以缓解CAM中因广泛使用无关状态而导致的内存冗余。实验结果表明，仅实施树映射方案即可将CAM容量需求降低$1.46\times$至$21.30\times$，而完整的RETENTION框架在精度损失小于3%的情况下实现了$4.35\times$至$207.12\times$的降低。这些结果证明RETENTION在最小化CAM资源需求方面具有高效性，为树模型加速提供了资源高效的发展方向。