Closed-loop network monitoring and orchestration increasingly require semantic interpretations of live telemetry beyond raw counter collection. However, dynamic cloud-edge environments change both the active node set and the monitoring query at runtime, while control loops demand bounded millisecond-scale responses. We introduce a latent predictive state estimator (LPSE) for dynamic network monitoring and orchestration, built on latent predictive learning over streaming telemetry. The framework converts variable-cardinality node telemetry into topology-adaptive temporal representations, fuses them with monitoring questions, and returns bounded answers from a semantic codebook instead of autoregressive text generation. This design enables fixed-cost, single-pass inference while preserving semantic interpretability. By operating on permutation-invariant, slot-routed node representations keyed by stable identity, the model maintains a fixed input space and generalizes to node addition, removal, and reordering without retraining. Experimental results on a multi-node Kubernetes cluster show semantic prediction accuracy of 82.42% at approximately 41$\times$ lower mean inference latency and 15$\times$ smaller memory footprint compared with a deployable 4B LLM endpoint.
翻译:闭环网络监控与编排正日益要求对实时遥测数据进行语义层面的理解,而不仅仅是原始计数器采集。然而,动态云边环境会在运行时改变活跃节点集和监控查询,同时控制回路要求毫秒级的有界响应。我们提出了一种用于动态网络监控与编排的潜在预测状态估计器(LPSE),其核心是基于流式遥测数据的潜在预测学习。该框架可将可变基数节点遥测数据转化为拓扑自适应的时间表示,与监控问题融合,并通过语义码本返回有界答案,而非采用自回归文本生成。这一设计实现了固定成本、单次前向推理,同时保留了语义可解释性。通过采用基于稳定身份标识的排列不变性、槽路由节点表示,模型可维持固定的输入空间,并在无需重新训练的情况下泛化至节点添加、移除及重排序场景。在多节点Kubernetes集群上的实验结果表明,与可部署的4B大语言模型端点相比,该方法的语义预测准确率达82.42%,平均推理延迟降低约41倍,内存占用减少15倍。