PreferThinker: Reasoning-based Personalized Image Preference Assessment

Personalized image preference assessment aims to evaluate an individual user's image preferences by relying only on a small set of reference images as prior information. Existing methods mainly focus on general preference assessment, training models with large-scale data to tackle well-defined tasks such as text-image alignment. However, these approaches struggle to handle personalized preference because user-specific data are scarce and not easily scalable, and individual tastes are often diverse and complex. To overcome these challenges, we introduce a common preference profile that serves as a bridge across users, allowing large-scale user data to be leveraged for training profile prediction and capturing complex personalized preferences. Building on this idea, we propose a reasoning-based personalized image preference assessment framework that follows a \textit{predict-then-assess} paradigm: it first predicts a user's preference profile from reference images, and then provides interpretable, multi-dimensional scores and assessments of candidate images based on the predicted profile. To support this, we first construct a large-scale Chain-of-Thought (CoT)-style personalized assessment dataset annotated with diverse user preference profiles and high-quality CoT-style reasoning, enabling explicit supervision of structured reasoning. Next, we adopt a two-stage training strategy: a cold-start supervised fine-tuning phase to empower the model with structured reasoning capabilities, followed by reinforcement learning to incentivize the model to explore more reasonable assessment paths and enhance generalization. Furthermore, we propose a similarity-aware prediction reward to encourage better prediction of the user's preference profile, which facilitates more reasonable assessments exploration. Extensive experiments demonstrate the superiority of the proposed method.

翻译：个性化图像偏好评估旨在仅依赖少量参考图像作为先验信息来评估个体用户的图像偏好。现有方法主要关注通用偏好评估，通过大规模数据训练模型以处理定义明确的任务（如文本-图像对齐）。然而，这些方法难以应对个性化偏好，因为用户特定数据稀缺且不易扩展，且个体品味通常多样且复杂。为克服这些挑战，我们引入了一种通用偏好画像作为跨用户的桥梁，使得能够利用大规模用户数据训练画像预测并捕捉复杂的个性化偏好。基于此思想，我们提出了一种基于推理的个性化图像偏好评估框架，遵循“先预测后评估”范式：首先从参考图像预测用户的偏好画像，随后基于预测的画像为候选图像提供可解释的多维度评分与评估。为支持此框架，我们首先构建了一个大规模思维链（CoT）风格的个性化评估数据集，标注了多样化的用户偏好画像及高质量的CoT式推理，实现了对结构化推理的显式监督。接着，我们采用两阶段训练策略：通过冷启动监督微调阶段赋予模型结构化推理能力，再通过强化学习激励模型探索更合理的评估路径以增强泛化能力。此外，我们提出相似度感知预测奖励机制，以促进更准确地预测用户偏好画像，从而推动更合理的评估探索。大量实验证明了所提方法的优越性。