Detecting QoS anomalies in 5G user planes requires fine-grained per-flow visibility, but existing telemetry approaches face a fundamental trade-off. Coarse per-class counters are lightweight but mask transient and per-flow anomalies, while per-packet telemetry postcards provide full visibility at prohibitive cost that grows linearly with line rate. Selective postcard schemes reduce overhead but miss anomalies that fall below configured thresholds or occur during brief intervals. We present Kestrel, a sketch-based telemetry system for 5G user planes that provides fine-grained visibility into key metric distributions such as latency tails and inter-arrival times at a fraction of the cost of per-packet postcards. Kestrel extends Count-Min Sketch with histogram-augmented buckets and per-queue partitioning, which compress per-packet measurements into compact summaries while preserving anomaly-relevant signals. We develop formal detectability guarantees that account for sketch collisions, yielding principled sizing rules and binning strategies that maximize anomaly separability. Our evaluations on a 5G testbed with Intel Tofino switches show that Kestrel achieves 10% better detection accuracy than existing selective postcard schemes while reducing export bandwidth by 10x.
翻译:检测5G用户平面的服务质量(QoS)异常需要细粒度的每流可见性,但现有遥测方法面临一个根本性的权衡。粗粒度的每类计数器虽轻量级,却会掩盖瞬态和每流异常;而每数据包遥测明信片虽提供完全可见性,但其成本随线路速率线性增长,代价高昂。选择性明信片方案虽降低开销,却会遗漏低于配置阈值或在短暂间隔内发生的异常。我们提出Kestrel,一种基于草图的5G用户平面遥测系统,能以每数据包明信片成本的一小部分,提供对关键指标分布(如延迟尾部和到达间隔时间)的细粒度可见性。Kestrel通过直方图增强桶和每队列分区扩展了Count-Min Sketch,将每数据包测量压缩为紧凑摘要,同时保留与异常相关的信号。我们建立了考虑草图碰撞的正式可检测性保证,产生了可最大化异常可分离性的原则性规模确定规则和分箱策略。在配备Intel Tofino交换机的5G测试平台上进行的评估表明,Kestrel比现有选择性明信片方案检测精度提高10%,同时导出带宽降低10倍。