Estimating instruction-level throughput is critical for many applications: multimedia, low-latency networking, medical, automotive, avionic, and industrial control systems all rely on tightly calculable and accurate timing bounds of their software. Unfortunately, how long a program may run - or if it may indeed stop at all - cannot be answered in the general case. This is why state-of-the-art throughput estimation tools usually focus on a subset of operations and make several simplifying assumptions. Correctly identifying these sets of constraints and regions of interest in the program typically requires source code, specialized tools, and dedicated expert knowledge. Whenever a single instruction is modified, this process must be repeated, incurring high costs when iteratively developing timing sensitive code in practice. In this paper, we present MCAD, a novel and lightweight timing analysis framework that can identify the effects of code changes on the microarchitectural level for binary programs. MCAD provides accurate differential throughput estimates by emulating whole program execution using QEMU and forwarding traces to LLVM for instruction-level analysis. This allows developers to iterate quickly, with low overhead, using common tools: identifying execution paths that are less sensitive to changes over timing-critical paths only takes minutes within MCAD. To the best of our knowledge this represents an entirely new capability that reduces turnaround times for differential throughput estimation by several orders of magnitude compared to state-of-the-art tools. Our detailed evaluation shows that MCAD scales to real-world applications like FFmpeg and Clang with millions of instructions, achieving < 3% geo mean error compared to ground truth timings from hardware-performance counters on x86 and ARM machines.
翻译:估计指令级吞吐量对许多应用至关重要:多媒体、低延迟网络、医疗、汽车、航空电子和工业控制系统都依赖于其软件的可精确计算且准确的时序界限。然而,程序可能运行多长时间——或者是否确实会停止——在一般情况下无法回答。这就是为什么最先进的吞吐量估计工具通常专注于操作子集并做出若干简化假设的原因。正确定位程序中的这些约束集合和感兴趣区域通常需要源代码、专门工具和专业知识。每当修改一条指令时,这一过程必须重复,从而在实际迭代开发时序敏感代码时产生高昂成本。在本文中,我们提出MCAD,一种新颖轻量级的时序分析框架,它能够识别二进制程序中代码更改对微架构级别的影响。MCAD通过使用QEMU模拟整个程序执行并将追踪转发至LLVM进行指令级分析,提供精确的差分吞吐量估计。这使得开发人员能够使用常用工具快速迭代,且开销低:在MCAD中,识别对时序关键路径变化不太敏感的执行路径仅需几分钟。据我们所知,这代表了一种全新能力,与最先进的工具相比,它将差分吞吐量估计的周转时间减少了几个数量级。我们的详细评估表明,MCAD可扩展至包含数百万条指令的实际应用(如FFmpeg和Clang),在x86和ARM机器上与硬件性能计数器测得的真实时序相比,几何平均误差低于3%。