Personalization has become crucial for adapting models to the diverse and evolving needs of users across cultural, temporal, and contextual dimensions. While existing methods often rely on centralized fine-tuning or static preference alignment within a single model, they struggle to achieve both real-time and high-quality personalization under the resource and privacy constraints of personal devices. To address this challenge, we propose CoSteer, a collaborative framework that enables tuning-free, real-time personalization via decoding-time adaptation. By leveraging logit differences between context-aware and context-agnostic local small models, CoSteer steers cloud-based large models, ensuring effective personalization while preserving the large model's capabilities. Personalization is handled locally, with only final tokens sent to the cloud, maintaining both user context and system efficiency. Through extensive experiments across a wide range of tasks, we demonstrate that CoSteer generates high-quality personalized content, ensuring both effectiveness and computational efficiency. Our results highlight its robustness across models and environments, confirming its practical applicability in real-world scenarios.
翻译:个性化已成为使模型适应用户在文化、时间与情境维度上多样化且动态变化需求的关键。现有方法通常依赖于集中式微调或单一模型内的静态偏好对齐,难以在个人设备的资源与隐私约束下同时实现实时且高质量的个性化。为应对这一挑战,我们提出CoSteer——一种通过解码时自适应实现免调优实时个性化的协作框架。该方法通过利用上下文感知与上下文无关的本地小模型之间的对数概率差异,引导云端大模型的生成过程,在保持大模型核心能力的同时实现有效个性化。个性化过程完全在本地处理,仅将最终生成的词元发送至云端,从而兼顾用户上下文隐私与系统效率。通过在广泛任务上的大量实验,我们证明CoSteer能够生成高质量的个性化内容,同时确保效果与计算效率。实验结果凸显了该框架在不同模型与环境中的鲁棒性,证实了其在现实场景中的实际适用性。