A MICRO 2024 best paper runner-up publication (the Mess paper) with all three artifact badges awarded (including ``Reproducible'') proposes a new benchmark to evaluate real and simulated memory system performance. The publication contends that Ramulator 2.0 and DAMOV (ZSim+Ramulator) (along with other existing memory system simulators) ``poorly resemble the actual system performance'' and asserts that their simulator is better. In this paper, we show that the Mess paper has 1) demonstrable technical misconfigurations, 2) methodological errors in interpreting simulation statistics, and 3) an incomplete artifact that makes its key results irreproducible. We demonstrate that the Ramulator 2.0 simulation results reported in the Mess paper are incorrect due to multiple configuration errors instead of inherent simulation inaccuracy claimed by the Mess paper. We show that by correctly configuring Ramulator 2.0, Ramulator 2.0's simulated memory system performance actually resembles real system characteristics well, and thus a key claimed contribution of the Mess paper is factually incorrect. We also identify that the DAMOV simulation results in the Mess paper use wrong simulation statistics that are unrelated to the simulated DRAM performance. We show that DAMOV's simulated DRAM latency is not constant, in contrast to the Mess paper's claim. Moreover, the Mess paper's artifact repository lacks the necessary sources to fully reproduce all the Mess paper's results. We find that the experiment scripts use simulator executables and other resources that are neither described in the Mess paper nor found in the artifact repository. We strongly encourage the computer architecture community to consider our corrections to the Ramulator 2.0 and DAMOV results of the Mess paper to prevent the propagation of inaccurate and misleading results and to maintain the reliability of the scientific record.
翻译:一篇荣获MICRO 2024最佳论文亚军(即“乱象”论文)并获得全部三项人工制品徽章(含“可复现”徽章)的出版物,提出了评估真实与模拟内存系统性能的新基准。该文声称Ramulator 2.0与DAMOV(ZSim+Ramulator)(以及其他现有内存系统模拟器)“与真实系统性能相似度极差”,并断言其模拟器性能更优。本文证明该“乱象”论文存在:1)可验证的技术配置错误,2)解读仿真统计数据的方法论谬误,3)关键结果因人工制品不完整而不可复现。我们论证“乱象”论文中报告的Ramulator 2.0仿真结果错误源于多重配置失误,而非该文声称的模拟器固有误差。通过正确配置Ramulator 2.0,其模拟的内存系统性能实则可良好复现真实系统特性,因此该文的核心创新声称实际上存在事实性错误。我们还发现“乱象”论文中DAMOV的仿真结果使用了与模拟DRAM性能无关的错误统计数据——DAMOV模拟的DRAM延迟并非该文声称的恒定值。此外,该论文的人工制品库缺失完全复现全部结果所需的源代码,实验脚本使用的模拟器可执行文件及其他资源既未在论文中说明,也未收录于人工制品库。我们强烈呼吁计算机体系结构学界采纳本文对“乱象”论文中Ramulator 2.0与DAMOV结果的修正,以阻止不准确且具误导性的结论扩散,维护科学记录的可靠性。