We present CausalSim, a causal framework for unbiased trace-driven simulation. Current trace-driven simulators assume that the interventions being simulated (e.g., a new algorithm) would not affect the validity of the traces. However, real-world traces are often biased by the choices algorithms make during trace collection, and hence replaying traces under an intervention may lead to incorrect results. CausalSim addresses this challenge by learning a causal model of the system dynamics and latent factors capturing the underlying system conditions during trace collection. It learns these models using an initial randomized control trial (RCT) under a fixed set of algorithms, and then applies them to remove biases from trace data when simulating new algorithms. Key to CausalSim is mapping unbiased trace-driven simulation to a tensor completion problem with extremely sparse observations. By exploiting a basic distributional invariance property present in RCT data, CausalSim enables a novel tensor completion method despite the sparsity of observations. Our extensive evaluation of CausalSim on both real and synthetic datasets, including more than ten months of real data from the Puffer video streaming system shows it improves simulation accuracy, reducing errors by 53% and 61% on average compared to expert-designed and supervised learning baselines. Moreover, CausalSim provides markedly different insights about ABR algorithms compared to the biased baseline simulator, which we validate with a real deployment.
翻译:我们提出CausalSim,一种用于无偏迹线驱动模拟的因果框架。当前的迹线驱动模拟器假设被模拟的干预措施(例如新算法)不会影响迹线的有效性。然而,真实世界中的迹线通常因迹线采集过程中算法做出的选择而产生偏差,因此在干预条件下重放迹线可能导致错误结果。CausalSim通过学习系统动力学的因果模型以及捕获迹线采集期间潜在系统条件的隐变量来解决这一挑战。它利用固定算法集下的初始随机对照试验(RCT)学习这些模型,并在模拟新算法时应用模型消除迹线数据中的偏差。CausalSim的核心是将无偏迹线驱动模拟映射到一个观测值极度稀疏的张量补全问题上。通过利用RCT数据中存在的基本分布不变性,CausalSim尽管面临观测稀疏性,仍能实现一种新颖的张量补全方法。我们在真实与合成数据集上对CausalSim进行了广泛评估(包括来自Puffer视频流系统超过十个月的真实数据),结果表明,与专家设计的基线及监督学习基线相比,该方法将模拟误差平均降低了53%和61%。此外,与存在偏差的基线模拟器相比,CausalSim对ABR算法提供了显著不同的见解,这些见解已通过实际部署得到验证。