We present a lightweight and interpretable decision framework for dynamic edge server selection in latency-critical applications that explicitly accounts for tail risk and switching stability. Each candidate server is characterised by predictive mean and uncertainty summaries of network latency, which are used to estimate the risk of service-level objective (SLO) violations and to guide selection. Risk is evaluated using a tight Normal approximation complemented by a conservative Cantelli bound, while percentile-based scoring coupled with hysteresis stabilizes decisions and suppresses oscillatory switching under short-lived network fluctuations. Experimental results on a multi-server edge testbed with a strict SLO of $τ= 0.5$\,s show that the proposed approach reduces the deadline-miss rate from 39\% to 34\% compared to a mean-only baseline, while reducing switching frequency from 46\% to 5.5\% ($\approx$88\% reduction) and maintaining sub-SLO average latency ($\approx$0.45\,s). These results demonstrate that explicit risk evaluation combined with stability-preserving control enables practical and robust adaptive server selection in dynamic edge environments.
翻译:我们提出了一种轻量级且可解释的决策框架,用于延迟关键型应用中的动态边缘服务器选择,该框架明确考虑了尾部风险与切换稳定性。每个候选服务器以网络延迟的预测均值和不确定性概述为特征,用于估计服务等级协议违反的风险并指导选择。风险评估采用严格的正态近似补充保守的Cantelli界,而基于百分位的评分与滞后机制相结合,可稳定决策并抑制短期网络波动下的振荡切换。在具有严格服务等级协议$τ=0.5$秒的多服务器边缘测试床上的实验结果表明,与仅基于均值的基线相比,所提方法将截止时间错过率从39%降低到34%,同时将切换频率从46%降低到5.5%(约降低88%),并保持低于服务等级协议的平均延迟(约0.45秒)。这些结果证明,显式风险评估与稳定性保持控制相结合,能够在动态边缘环境中实现实用且鲁棒的自适应服务器选择。