Generative ML models are increasingly popular in networking for tasks such as telemetry imputation, prediction, and synthetic trace generation. Despite their capabilities, they suffer from two shortcomings: \emph{(i)} their output is often visibly violating well-known networking rules, which undermines their trustworthiness; and \emph{(ii)} they are difficult to control, frequently requiring retraining even for minor changes. To address these limitations and unlock the benefits of generative models for networking, we propose a new paradigm for integrating explicit network knowledge, in the form of first-order logic rules, into ML models used for networking tasks. Rules capture well-known relationships among observed signals, e.g., that increased latency precedes packet loss. While the idea is conceptually straightforward, its realization is challenging: networking knowledge is rarely formalized into rules, and naively injecting rules into ML models often hampers their effectiveness. This paper introduces NetNomos, a multi-stage framework that \emph{(i)} learns rules directly from data (e.g., measurements); \emph{(ii)} filters them to select semantically meaningful ones; and \emph{(iii)} enforces them through collaborative generation between an ML model and a Satisfiability Modulo Theories (SMT) solver. %We evaluate NetNomos both component-wise and end-to-end across four diverse network datasets. We show that NetNomos learns diverse, meaningful rules from four real-world datasets and is 1.6--6.5$\times$ more scalable than DuoAI, a state-of-the-art (SOTA) rule-learning method. By enforcing these rules on a generic GPT-2 model, NetNomos achieves performance on par with or even surpassing specialized SOTA systems such as Zoom2Net and NetShare across three networking tasks: telemetry imputation, traffic forecasting, and synthetic data generation.
翻译:生成式机器学习模型在网络领域日益普及,可用于遥测数据插补、预测和合成轨迹生成等任务。尽管功能强大,但这些模型存在两个缺陷:\emph{(i)} 其输出结果常明显违反已知网络规则,削弱了可信度;\emph{(ii)} 难以控制,即便微小调整也需重新训练。为解决这些局限并释放生成式模型在网络领域的潜力,我们提出一种新范式:将显式网络知识(以一阶逻辑规则形式)集成到用于网络任务的机器学习模型中。规则可捕捉观测信号间的已知关系(如延迟增加是丢包的前兆)。尽管该思路概念简明,但其实现颇具挑战:网络知识很少被形式化为规则,且简单地将规则注入机器学习模型常会降低模型效能。本文提出NetNomos多阶段框架,该框架可:(i) 直接从数据(如测量数据)中学习规则;(ii) 过滤规则以选取具有语义意义的规则;(iii) 通过机器学习模型与可满足性模理论求解器的协同生成来强制执行规则。我们在四个真实数据集上展示了NetNomos学习多样化有意义规则的能力,其可扩展性较先进规则学习方法DuoAI提升1.6-6.5倍。通过将规则应用于通用GPT-2模型,NetNomos在遥测插补、流量预测和合成数据生成三项网络任务中取得了与专业先进系统Zoom2Net和NetShare相当甚至更优的性能。