We study a decentralized collaborative requesting problem that aims to optimize the information freshness of time-sensitive clients in edge networks consisting of multiple clients, access nodes (ANs), and servers. Clients request content through ANs acting as gateways, without observing AN states or the actions of other clients. We define the reward as the age of information reduction resulting from a client's selection of an AN, and formulate the problem as a non-stationary multi-armed bandit. In this decentralized and partially observable setting, the resulting reward process is history-dependent and coupled across clients, and exhibits both abrupt and gradual changes in expected rewards, rendering classical bandit-based approaches ineffective. To address these challenges, we propose the AGING BANDIT WITH ADAPTIVE RESET algorithm, which combines adaptive windowing with periodic monitoring to track evolving reward distributions. We establish theoretical performance guarantees showing that the proposed algorithm achieves near-optimal performance, and we validate the theoretical results through simulations.
翻译:本文研究去中心化协作请求问题,旨在优化由多客户端、接入节点(AN)和服务器构成的边缘网络中时间敏感客户端的信息新鲜度。客户端通过作为网关的接入节点请求内容,且无法观测接入节点状态或其他客户端行为。我们将客户端选择接入节点所获得的信息年龄减少量定义为奖励,并将该问题建模为非平稳多臂老虎机问题。在这种去中心化且部分可观测的场景中,产生的奖励过程具有历史依赖性并在客户端间相互耦合,其期望奖励同时呈现突变与渐变特性,导致经典老虎机方法失效。为应对这些挑战,我们提出"自适应重置老化老虎机"算法,该算法通过自适应窗口技术与周期性监测机制相结合来追踪时变的奖励分布。我们建立了理论性能保证,证明所提算法可实现近似最优性能,并通过仿真验证了理论结果。