Personalized image aesthetics assessment (PIAA) aims to predict an individual user's subjective rating of an image, which requires modeling user-specific aesthetic preferences. Existing methods rely on historical user ratings for this modeling and therefore struggle when such data are unavailable. We address this zero-shot setting by using user profiles as contextual signals for personalization and adopting a profile-based personalization paradigm. We introduce P-MLLM, a profile-aware multimodal LLM that augments a frozen LLM with selective fusion modules for controlled visual integration. These modules selectively integrate visual information into the model's evolving hidden states during profile-conditioned reasoning, allowing visual information to be incorporated in a profile-aware manner. Experiments on recent PIAA benchmarks show that P-MLLM achieves competitive zero-shot performance and remains effective even with coarse profile information, highlighting the potential of profile-based personalization for zero-shot PIAA.
翻译:个性化图像美学评估(PIAA)旨在预测个体用户对图像的主观评分,这需要建模用户特定的审美偏好。现有方法依赖用户历史评分数据进行建模,因此在缺乏此类数据时表现不佳。针对这一零样本设定,我们采用用户画像作为个性化上下文信号,并提出基于画像的个性化范式。我们引入P-MLLM——一种画像感知的多模态大语言模型,该模型通过选择性融合模块增强固定参数的LLM,实现可控的视觉信息整合。这些模块在基于画像的推理过程中,将视觉信息选择性地融入模型动态变化的隐状态,实现画像感知的视觉信息融合。在最新PIAA基准上的实验表明,P-MLLM在零样本场景下取得具有竞争力的性能,即便使用粗粒度画像信息仍保持有效性,突显了基于画像的个性化范式在零样本PIAA任务中的潜力。