Reliable Microservice Tail Latency Prediction via Decoupled Dual-Stream Learning and Gradient Modulation

Microservice architectures enable scalable cloud-native applications; however, the distributed nature of these systems complicates the maintenance of strict Service Level Objectives. Accurately predicting window-level P95 tail latency remains difficult due to the complex interactions between software workload propagation and infrastructure resource limits. Existing predictive models struggle to capture these dynamics because the lack of explicit separation between traffic metrics and resource metrics causes misaligned feature representations. Building on this suboptimal data treatment, the unified architectures of prior approaches fail to isolate cascading service dependencies from localized processing capacity. Due to this entanglement, joint training suffers from an optimization imbalance wherein resource features converge faster and dominate gradient updates, thereby preventing the learning of underlying software topologies. To address these challenges, we propose USRFNet, a dual-stream framework that separates the modeling of demand and capacity. The proposed framework utilizes a Graph Neural Network to model the spatial interactions of traffic workloads across software-level service dependencies, and a gating MLP to independently extract infrastructure-level resource dynamics. The model then integrates these representations through hierarchical tensor fusion. To resolve the training imbalance, we introduce a Reliability-Aware Gradient Modulation strategy that dynamically rescales gradients based on the generalization ratio of each data stream. Experiments on three large-scale real-world benchmarks demonstrate that USRFNet outperforms state-of-the-art methods in prediction accuracy. Specifically, compared to the best-performing baselines, the proposed framework achieves relative MAPE reductions ranging from 15.62% to 26.11% across the evaluated datasets.

翻译：微服务架构支持可扩展的云原生应用；然而，这些系统的分布式特性使得维护严格的服务等级目标变得复杂。由于软件工作负载传播与基础设施资源限制之间的复杂交互，准确预测窗口级P95尾延迟仍然困难重重。现有预测模型难以捕捉这些动态，因为流量指标与资源指标之间缺乏显式分离，导致特征表示错位。基于这种次优的数据处理方式，先前方法的统一架构未能将级联的服务依赖关系与局部处理能力分离开来。由于这种纠缠，联合训练面临优化不平衡问题，其中资源特征收敛更快并主导梯度更新，从而阻碍了底层软件拓扑的学习。为了解决这些挑战，我们提出了USRFNet——一个将需求与容量建模解耦的双流框架。该框架利用图神经网络对软件级服务依赖关系中的流量工作负载空间交互进行建模，并通过门控MLP独立提取基础设施级资源动态。随后，模型通过分层张量融合整合这些表示。为解决训练不平衡问题，我们引入了一种可靠性感知梯度调制策略，该策略根据每个数据流的泛化比率动态重新缩放梯度。在三个大规模真实世界基准上的实验表明，USRFNet在预测准确性上优于现有最先进方法。具体而言，与表现最佳的基线相比，所提框架在评估数据集上的相对MAPE降低幅度达到15.62%至26.11%。