Software caches are an intrinsic component of almost every computer system. Consequently, caching algorithms, particularly eviction policies, are the topic of many papers. Almost all these prior papers evaluate the caching algorithm based on its hit ratio, namely the fraction of requests that are found in the cache, as opposed to disk. The hit ratio is viewed as a proxy for traditional performance metrics like system throughput or response time. Intuitively it makes sense that higher hit ratio should lead to higher throughput (and lower response time), since more requests are found in the cache (low access time) as opposed to the disk (high access time). This paper challenges this intuition. We show that increasing the hit ratio can actually hurt the throughput (and response time) for many caching algorithms. Our investigation follows a three-pronged approach involving (i) queueing modeling and analysis, (ii) implementation and measurement, and (iii) simulation to validate the accuracy of the queueing model. We also show that the phenomenon of throughput decreasing at higher hit ratios is likely to be more pronounced in future systems, where the trend is towards faster disks and higher numbers of cores per CPU.
翻译:软件缓存几乎是每个计算机系统的固有组成部分。因此,缓存算法(尤其是淘汰策略)成为许多论文的主题。几乎所有先前的研究都基于命中率(即从缓存而非磁盘中找到请求的比例)来评估缓存算法,并将其视为系统吞吐量或响应时间等传统性能指标的代理。直观上,更高的命中率应带来更高的吞吐量(和更低的响应时间),因为更多请求在缓存(低访问时间)而非磁盘(高访问时间)中找到。本文挑战了这一直觉。我们证明,对于许多缓存算法,提高命中率实际上可能损害吞吐量(和响应时间)。我们的研究采用三管齐下的方法,包括(i)排队建模与分析,(ii)实现与测量,以及(iii)模拟以验证排队模型的准确性。我们还表明,在命中率较高时吞吐量下降的现象在未来系统中可能更加显著,因为未来系统趋势是磁盘速度更快、每个CPU内核数量更多。