Latency Based Tiling provides a systems based approach to deriving approximate tiling solution that maximizes locality while maintaining a fast compile time. The method uses triangular loops to characterize miss ratio scaling of a machine avoiding prefetcher distortion. Miss ratio scaling captures the relationship between data access latency and working set size with sharp increases in latency indicating the data footprint exceeds capacity from a cache level. Through these noticeable increases in latency we can determine an approximate location for L1, L2, and L3 memory sizes. These sizes are expected to be under approximations of a systems true memory sizes which is in line with our expectations given the shared nature of cache in a multi process system as described in defensive loop tiling. Unlike auto tuning, which can be effective but prohibitively slow, Latency Based Tiling achieves negligible compile time overhead. The implementation in Rust enables a hardware agnostic approach which combined with a cache timing based techniques, yields a portable, memory safe system running wherever Rust is supported. The tiling strategy is applied to a subset of the polyhedral model, where loop nestings are tiled based on both the derived memory hierarchy and the observed data footprint per iteration.
翻译:基于延迟的平铺提供了一种基于系统的方法来推导近似平铺方案,该方案在保持快速编译时间的同时最大化数据局部性。该方法使用三角循环来表征机器的缺失率缩放,避免了预取器带来的失真。缺失率缩放捕捉了数据访问延迟与工作集大小之间的关系,延迟的急剧增加表明数据足迹超过了某一级缓存的容量。通过这些显著的延迟增加,我们可以确定L1、L2和L3内存大小的近似位置。这些大小预期是系统真实内存大小的低估近似,这与我们在多进程系统中缓存共享特性下的预期一致,正如防御性循环平铺中所描述的那样。与自动调优(虽然可能有效但编译速度极慢)不同,基于延迟的平铺实现了可忽略的编译时间开销。该实现在Rust中完成,支持硬件无关的方法,结合基于缓存定时的技术,产生了一个可移植、内存安全的系统,可在任何支持Rust的环境中运行。该平铺策略应用于多面体模型的一个子集,其中循环嵌套的平铺基于推导出的内存层次结构和每次迭代观测到的数据足迹。