Cooperative edge caching in overlapping zones couples Base Station (BS) decisions, making content replacement sensitive to spatial topology and temporal reuse. Conventional heuristics suffer from myopia, while Deep Reinforcement Learning relies on brittle numerical representations and needs prohibitive retraining under topological or traffic dynamics. This paper studies a centralized, cooperative multi-BS cache-replacement controller driven by a Large Language Model (LLM) within a deterministic text-to-action loop. At each time slot, the global cache state is rendered into a prompt encapsulating each BS's inventory, deduplicated requests, and multi-scale frequency summaries. The LLM generates one decision line per BS. A strict parser and feasibility checker then either accept the joint action or fall back to an all-BS NoOp action. We align the LLM via two-stage training: Supervised Fine-Tuning on look-ahead expert trajectories to acquire action syntax and robust initialization, followed by Group Relative Policy Optimization. This employs an 'opportunity-aware' reward, using multi-step cooperative hit rate gains relative to a NoOp baseline as the primary signal, plus penalties for invalid outputs. We focus on reactive replacement of equal-sized files, max one replacement per BS per slot, and insertions restricted to current requests. Evaluating on identical request traces and association graphs, our orchestrator approaches a single-step exhaustive-search reference (0.610 vs. 0.617 in a 5-BS scenario), surpasses classical baselines (+4.1% over least-frequently used), and exhibits robust zero-shot transfer across cache capacity, library size, popularity skewness, and user density. Code is available at https://github.com/gracefulning/CoopLLM-Cache.
翻译:在重叠区域的协作边缘缓存中,基站决策相互耦合,使得内容替换对空间拓扑和时间复用高度敏感。传统启发式方法存在短视问题,而深度强化学习依赖脆弱的数值表示,在拓扑或流量动态变化时需进行代价高昂的重新训练。本文研究一种由大语言模型驱动的集中式协作多基站缓存替换控制器,采用确定性文本到动作循环。在每个时隙,全局缓存状态被转化为包含各基站库存清单、去重请求及多尺度频率摘要的提示词。大语言模型为每个基站生成一条决策线,经严格解析器和可行性检查后,要么接受联合动作序列,要么回退到全基站空操作。我们通过两阶段训练对齐大语言模型:首先在前瞻专家轨迹上进行监督微调以掌握动作语法和鲁棒初始化,随后采用群体相对策略优化。该方法使用"机会感知"奖励函数,以相对于空操作基线获得的多步协作命中率增益作为主信号,并对无效输出施加惩罚。我们聚焦于等大小文件的被动替换,每个时隙每基站最多一次替换,且插入操作仅限于当前请求。在相同请求轨迹与关联图上的评估表明,我们的编排器性能接近单步穷举搜索基准(5基站场景下0.610 vs 0.617),超越经典基线(较最不常用算法提升4.1%),并在缓存容量、文件库规模、流行度偏斜度及用户密度方面展现鲁棒的零样本迁移能力。代码开源于https://github.com/gracefulning/CoopLLM-Cache。