Machine learning (ML) is a powerful tool to model the complexity of communication networks. As networks evolve, we cannot only train once and deploy. Retraining models, known as continual learning, is necessary. Yet, to date, there is no established methodology to answer the key questions: With which samples to retrain? When should we retrain? We address these questions with the sample selection system Memento, which maintains a training set with the "most useful" samples to maximize sample space coverage. Memento particularly benefits rare patterns -- the notoriously long "tail" in networking -- and allows assessing rationally when retraining may help, i.e., when the coverage changes. We deployed Memento on Puffer, the live-TV streaming project, and achieved a 14% reduction of stall time, 3.5x the improvement of random sample selection. Finally, Memento does not depend on a specific model architecture; it is likely to yield benefits in other ML-based networking applications.
翻译:机器学习(ML)是模拟通信网络复杂性的强大工具。随着网络的演变,我们不能仅训练一次就部署。重新训练模型,即所谓的持续学习,是必要的。然而,迄今为止,尚未建立成熟的方法来回答关键问题:应使用哪些样本进行重新训练?何时应重新训练?我们通过样本选择系统Memento来解决这些问题,该系统维护一个包含“最有用”样本的训练集,以最大化样本空间覆盖。Memento尤其有益于罕见模式——网络中臭名昭著的“长尾”——并允许理性评估重新训练何时可能有帮助,即当覆盖范围发生变化时。我们在实时电视流项目Puffer上部署了Memento,实现了14%的卡顿时间减少,这是随机样本选择改进的3.5倍。最后,Memento不依赖于特定的模型架构;它很可能在其他基于ML的网络应用中带来益处。