Cloud-device collaborative recommendation partitions computation across the cloud and user devices: the cloud provides semantic user modeling, while the device leverages recent interactions and cloud semantic signals for privacy-preserving, responsive reranking. With large language models (LLMs) on the cloud, semantic user representations can improve sequential recommendation by capturing high-level intent. However, regenerating such representations via cloud LLM inference for every request is often infeasible at real-world scale. As a result, on-device reranking commonly reuses a cached cloud semantic user embedding across requests. We empirically identify a cloud semantic staleness effect: reused embeddings become less aligned with the user's latest interactions, leading to measurable ranking degradation. Most existing LLM-enabled cloud-device recommenders are typically designed around on-demand cloud semantics, either by assuming low-latency cloud LLM access or by regenerating semantic embeddings per request. When per-request regeneration is infeasible and cached semantics must be reused, two technical challenges arise: (1) deciding when cached cloud semantics remain useful for on-device reranking, and (2) maintaining ranking quality when the cloud LLM cannot be invoked and only cached semantics are available. To address this gap, we introduce the Semantic Calibration for LLM-enabled Cloud-Device Recommendation (SCaLRec). First, it estimates the reliability of cached semantics under the user's latest interactions. Second, an on-device semantic calibration module is proposed to adjusts the cached semantic embedding on-device using up-to-date interaction evidence, without per-request cloud LLM involvement. Experiments on real-world datasets show that SCaLRec consistently improves recommendation performance over strong baselines under cloud semantic staleness.
翻译:云-端协同推荐将计算任务分配至云端与用户设备:云端提供语义化用户建模,而设备端则利用近期交互记录与云端语义信号进行隐私保护且响应迅速的重新排序。借助云端部署的大语言模型(LLMs),语义化用户表征能够通过捕捉高层用户意图来提升序列推荐性能。然而,在实际应用规模下,为每个请求通过云端LLM推理重新生成此类表征通常难以实现。因此,设备端重排序通常会在多个请求间复用缓存的云端语义用户嵌入向量。我们通过实证研究发现了一种云端语义陈旧化效应:重复使用的嵌入向量会逐渐偏离用户最新交互行为,导致可量化的排序性能下降。现有大多数基于LLM的云-端推荐系统通常围绕按需获取云端语义的设计思路构建,其前提假设是能够低延迟访问云端LLM或为每个请求重新生成语义嵌入。当无法实现逐请求重新生成且必须复用缓存语义时,将面临两个技术挑战:(1)如何判断缓存的云端语义是否仍适用于设备端重排序;(2)在无法调用云端LLM且仅能使用缓存语义时,如何维持排序质量。为应对这一缺陷,我们提出了面向大语言模型云-端协同推荐的语义校准框架(SCaLRec)。该框架首先评估缓存语义在用户最新交互情境下的可靠性;其次,提出一种设备端语义校准模块,该模块利用最新的交互证据在设备端对缓存的语义嵌入向量进行调整,无需逐请求调用云端LLM。在真实数据集上的实验表明,在云端语义陈旧化场景下,SCaLRec相较于现有强基线模型能够持续提升推荐性能。