Improving the Representativeness of Simulation Intervals for the Cache Memory System

Accurate simulation techniques are indispensable to efficiently propose new memory or architectural organizations. As implementing new hardware concepts in real systems is often not feasible, cycle-accurate simulators employed together with certain benchmarks are commonly used. However, detailed simulators may take too much time to execute these programs until completion. Therefore, several techniques aimed at reducing this time are usually employed. These schemes select fragments of the source code considered as representative of the entire application's behaviour -- mainly in terms of performance, but not plenty considering the behaviour of cache memory levels -- and only these intervals are simulated. Our hypothesis is that the different simulation windows currently employed when evaluating microarchitectural proposals, especially those involving the last level cache (LLC), do not reproduce the overall cache behaviour during the entire execution, potentially leading to wrong conclusions on the real performance of the proposals assessed. In this work, we first demonstrate this hypothesis by evaluating different cache replacement policies using various typical simulation approaches. Consequently, we also propose a simulation strategy, based on the applications' LLC activity, which mimics the overall behaviour of the cache much closer than conventional simulation intervals. Our proposal allows a fairer comparison between cache-related approaches as it reports, on average, a number of changes in the relative order among the policies assessed -- with respect to the full simulation -- more than 30\% lower than that of conventional strategies, maintaining the simulation time largely unchanged and without losing accuracy on performance terms, especially for memory-intensive applications.

翻译：精确的仿真技术对于高效提出新型存储器或体系结构组织至关重要。由于在实际系统中实现新硬件概念通常不可行，常采用周期精确的仿真器配合特定基准程序进行使用。然而，详细仿真器执行这些程序直至完成可能需要过多时间。因此，通常采用多种旨在减少这一时间的技术。这些方案选取被视为能代表整个应用程序行为（主要是在性能方面，但较少考虑缓存存储层级行为）的源代码片段，仅对这些区间进行仿真。我们的假设是：当前在评估微架构方案（尤其是涉及末级缓存（LLC）的方案）时所采用的不同仿真窗口，无法再现整个执行过程中的整体缓存行为，可能导致对被评估方案真实性能的错误结论。在本工作中，我们首先通过使用多种典型仿真方法评估不同缓存替换策略来验证这一假设。随后，我们提出了一种基于应用程序LLC活动的仿真策略，该策略比传统仿真区间更接近地再现缓存的整体行为。我们的方案能够实现缓存相关方法之间更公平的比较，因为与完整仿真相比，它在评估策略的相对顺序变化次数上平均比传统策略降低超过30%，同时保持仿真时间基本不变且不损失性能方面的精度，尤其适用于内存密集型应用。