In edge computing, users' service profiles are migrated due to user mobility. Reinforcement learning (RL) frameworks have been proposed to do so, often trained on simulated data. However, existing RL frameworks overlook occasional server failures, which although rare, impact latency-sensitive applications like autonomous driving and real-time obstacle detection. Nevertheless, these failures (rare events), being not adequately represented in historical training data, pose a challenge for data-driven RL algorithms. As it is impractical to adjust failure frequency in real-world applications for training, we introduce FIRE, a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment. We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function. FIRE considers delay, migration, failure, and backup placement costs across individual and shared service profiles. We prove ImRE's boundedness and convergence to optimality. Next, we introduce novel deep Q-learning (ImDQL) and actor critic (ImACRE) versions of our algorithm to enhance scalability. We extend our framework to accommodate users with varying risk tolerances. Through trace driven experiments, we show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.
翻译:在边缘计算中,用户的业务配置文件因用户移动性而发生迁移。业界已提出多种基于强化学习的框架来执行此类迁移,且通常使用模拟数据进行训练。然而,现有强化学习框架忽略了偶发的服务器故障事件——尽管此类故障罕见,却会对自动驾驶、实时障碍物检测等延迟敏感型应用造成影响。由于这些故障(稀有事件)在历史训练数据中未能充分体现,给数据驱动的强化学习算法带来挑战。考虑到在实际应用中调整故障频率以训练模型并不现实,我们提出了FIRE框架,该框架通过在边缘计算数字孪生环境中训练强化学习策略来适应稀有事件。我们提出了ImRE算法——一种基于重要性采样的Q学习算法,该算法根据稀有事件对价值函数的影响程度进行比例采样。FIRE综合考虑了独立业务配置与共享业务配置下的延迟成本、迁移成本、故障成本及备份部署成本。我们证明了ImRE的有界性及其收敛到最优解的特性。随后,我们提出了算法的深度Q学习版本(ImDQL)和演员-评论家版本(ImACRE)以增强可扩展性,并将框架扩展至适应具有不同风险容忍度的用户。通过基于实际轨迹驱动的实验表明,在发生故障时,FIRE相较于原始强化学习算法及贪心基准算法能有效降低成本。