Clothing recommendation extends beyond merely generating personalized outfits; it serves as a crucial medium for aesthetic guidance. However, existing methods predominantly rely on user-item-outfit interaction behaviors while overlooking explicit representations of clothing aesthetics. To bridge this gap, we present the AesRec benchmark dataset featuring systematic quantitative aesthetic annotations, thereby enabling the development of aesthetics-aligned recommendation systems. Grounded in professional apparel quality standards and fashion aesthetic principles, we define a multidimensional set of indicators. At the item level, six dimensions are independently assessed: silhouette, chromaticity, materiality, craftsmanship, wearability, and item-level impression. Transitioning to the outfit level, the evaluation retains the first five core attributes while introducing stylistic synergy, visual harmony, and outfit-level impression as distinct metrics to capture the collective aesthetic impact. Given the increasing human-like proficiency of Vision-Language Models in multimodal understanding and interaction, we leverage them for large-scale aesthetic scoring. We conduct rigorous human-machine consistency validation on a fashion dataset, confirming the reliability of the generated ratings. Experimental results based on AesRec further demonstrate that integrating quantified aesthetic information into clothing recommendation models can provide aesthetic guidance for users while fulfilling their personalized requirements.
翻译:服装推荐不仅限于生成个性化搭配,更作为美学引导的关键媒介。然而,现有方法主要依赖用户-商品-搭配的交互行为,却忽视了服装美学的显式表征。为弥补这一不足,我们提出了AesRec基准数据集,其具备系统化的量化美学标注,从而支持美学对齐推荐系统的开发。基于专业服装质量标准与时尚美学原则,我们定义了一套多维指标体系。在单品层面,独立评估六个维度:廓形、色彩、材质、工艺、实穿性与单品级印象。过渡至搭配层面,评估保留了前五项核心属性,同时引入风格协同性、视觉协调性与搭配级印象作为独立指标,以捕捉整体美学效果。鉴于视觉-语言模型在多模态理解与交互方面日益增强的类人能力,我们利用其进行大规模美学评分。我们在时尚数据集上进行了严格的人机一致性验证,证实了生成评分的可靠性。基于AesRec的实验结果进一步表明,将量化美学信息整合到服装推荐模型中,可在满足用户个性化需求的同时为其提供美学引导。