The Unexpected Efficiency of Bin Packing Algorithms for Dynamic Storage Allocation in the Wild: An Intellectual Abstract

Recent work has shown that viewing allocators as black-box 2DBP solvers bears meaning. For instance, there exists a 2DBP-based fragmentation metric which often correlates monotonically with maximum resident set size (RSS). Given the field's indeterminacy with respect to fragmentation definitions, as well as the immense value of physical memory savings, we are motivated to set allocator-generated placements against their 2DBP-devised, makespan-optimizing counterparts. Of course, allocators must operate online while 2DBP algorithms work on complete request traces; but since both sides optimize criteria related to minimizing memory wastage, the idea of studying their relationship preserves its intellectual--and practical--interest. Unfortunately no implementations of 2DBP algorithms for DSA are available. This paper presents a first, though partial, implementation of the state-of-the-art. We validate its functionality by comparing its outputs' makespan to the theoretical upper bound provided by the original authors. Along the way, we identify and document key details to assist analogous future efforts. Our experiments comprise 4 modern allocators and 8 real application workloads. We make several notable observations on our empirical evidence: in terms of makespan, allocators outperform Robson's worst-case lower bound $93.75\%$ of the time. In $87.5\%$ of cases, GNU's \texttt{malloc} implementation demonstrates equivalent or superior performance to the 2DBP state-of-the-art, despite the second operating offline. Most surprisingly, the 2DBP algorithm proves competent in terms of fragmentation, producing up to $2.46$x better solutions. Future research can leverage such insights towards memory-targeting optimizations.

翻译：近期的研究表明，将内存分配器视为黑盒二维装箱（2DBP）求解器具有实际意义。例如，存在一种基于2DBP的碎片度量指标，其与最大常驻内存集大小（RSS）通常呈单调相关。鉴于该领域在碎片定义上的不确定性，以及物理内存节省的巨大价值，我们受到启发，将分配器生成的放置方案与2DBP设计的、以完工时间优化为目标的方案进行对比。当然，分配器必须在线运行，而2DBP算法则基于完整的请求轨迹工作；但由于双方都优化与最小化内存浪费相关的准则，研究它们之间关系的想法仍然保持着知识性与实践性的价值。遗憾的是，目前尚无针对动态存储分配（DSA）的2DBP算法实现可用。本文首次提出了一个初步但先进的实现方案。我们通过将其输出的完工时间与原作者提供的理论上界进行比较，验证了其功能。在此过程中，我们识别并记录了关键细节，以助益未来的类似工作。我们的实验涵盖了4个现代分配器和8个真实应用负载。实验证据中我们获得了若干显著观察：就完工时间而言，分配器在93.75%的情况下优于Robson最坏情况下的下界。在87.5%的情况下，GNU的\texttt{malloc}实现表现出与2DBP先进算法相当或更优的性能，尽管后者是离线工作的。最令人惊讶的是，2DBP算法在碎片方面表现优良，可产生高达2.46倍的更优解。未来研究可利用这些洞察进行针对内存的优化。