Large language models (LLMs) are typically aligned with population-level preferences, despite substantial variation across individual users. While many LLM personalization methods exist, the underlying structure of user-level personalization is often left implicit. We formalize user-level, prompt-independent personalization as a decomposition into two components: preference inference and conditioned generation. We advocate for a modular design that decouples these components; identify natural language as a generator-agnostic interface between them; and characterize generator-transferability as a key implication of modular personalization. Guided by this abstraction, we introduce POPI, a novel instantiation of modular personalization that parameterizes both preference inference and conditioned generation as shared LLMs. POPI jointly optimizes the two components under a unified preference optimization objective, using reinforcement learning as an optimization tool. Across multiple benchmarks, POPI consistently improves personalization performance while reducing context overhead. We further demonstrate that the learned natural-language preference summaries transfer effectively to frozen, off-the-shelf LLMs, including black-box APIs, providing empirical evidence of modularity and generator-transferability.
翻译:大型语言模型(LLMs)通常基于群体层面的偏好进行对齐,但不同用户间存在显著差异。尽管已有多种LLM个性化方法,但用户层面个性化的内在结构往往未被显式定义。我们将用户层面、与提示无关的个性化形式化为两个组件的分解:偏好推断与条件生成。我们主张采用解耦这两个组件的模块化设计;将自然语言识别为二者间生成器无关的接口;并将生成器可迁移性定义为模块化个性化的关键特性。在此抽象框架指导下,我们提出POPI——一种新颖的模块化个性化实现方案,它将偏好推断和条件生成均参数化为共享的LLMs。POPI在统一的偏好优化目标下,使用强化学习作为优化工具,对两个组件进行联合优化。在多个基准测试中,POPI在降低上下文开销的同时,持续提升个性化性能。我们进一步证明,学习得到的自然语言偏好摘要能够有效迁移至冻结的现成LLMs(包括黑盒API),为模块化与生成器可迁移性提供了实证依据。