The Branch Target Buffer (BTB) plays a critical role in efficient CPU branch prediction. Understanding the design and implementation of the BTB provides valuable insights for both compiler design and the mitigation of hardware attacks such as Spectre. However, the proprietary nature of dominant CPUs, such as those from Intel, AMD, Apple, and Qualcomm, means that specific BTB implementation details are not publicly available. To address this limitation, several previous works have successfully reverse-engineered BTB information, including capacity and associativity, primarily targeting Intel's x86 processors. However, to our best knowledge, no research has attempted to reverse-engineer and expose the BTB implementation of ARM processors. This project aims to fill the gap by exploring the BTB of ARM processors. Specifically, we investigate whether existing reverse-engineering techniques developed for Intel BTB can be adapted for ARM. We reproduce the x86 methodology and identify specific PMU events for ARM to facilitate the reverse engineering process. In our experiment, we investigated our ARM CPU, i.e., the quad-core Cortex-A72 of the Raspberry Pi 4B. Our results show that the BTB capacity is 4K, the set index starts from the 5th bit and ends with the 15th bit of the PC (11 bits in total), and there are 2 ways in each set. The source code can be find at https://github.com/stefan1wan/BTB_ARM_RE.
翻译:分支目标缓冲区(BTB)在CPU高效分支预测中起着关键作用。理解BTB的设计与实现能为编译器设计及Spectre等硬件攻击的缓解提供重要洞见。然而,由于英特尔、AMD、苹果和高通等主流CPU厂商的专有性质,具体的BTB实现细节并未公开。为突破这一限制,先前多项研究已成功逆向工程出BTB的容量与关联度等信息,主要针对英特尔的x86处理器。但据我们所知,尚无研究尝试对ARM处理器的BTB实现进行逆向工程与解析。本项目旨在通过探索ARM处理器的BTB填补这一空白。具体而言,我们研究针对英特尔BTB开发的现有逆向工程技术是否适用于ARM架构。我们复现了x86方法论,并为ARM平台确定了特定的性能监控单元(PMU)事件以支持逆向工程流程。实验中,我们以树莓派4B的四核Cortex-A72处理器为研究对象进行探究。结果表明:该BTB容量为4K,集合索引从程序计数器的第5位开始至第15位结束(共11位),每个集合包含2路组相联。源代码可通过https://github.com/stefan1wan/BTB_ARM_RE获取。