Fashion understanding requires both visual perception and expert-level reasoning about style, occasion, compatibility, and outfit rationale. However, existing fashion datasets remain fragmented and task-specific, often focusing on item attributes, outfit co-occurrence, or weak textual supervision, and thus provide limited support for holistic outfit understanding. In this paper, we introduce FashionStylist, an expert-annotated benchmark for holistic and expert-level fashion understanding. Constructed through a dedicated fashion-expert annotation pipeline, FashionStylist provides professionally grounded annotations at both the item and outfit levels. It supports three representative tasks: outfit-to-item grounding, outfit completion, and outfit evaluation. These tasks cover realistic item recovery from complex outfits with layering and accessories, compatibility-aware composition beyond co-occurrence matching, and expert-level assessment of style, season, occasion, and overall coherence. Experimental results show that FashionStylist serves not only as a unified benchmark for multiple fashion tasks, but also as an effective training resource for improving grounding, completion, and outfit-level semantic evaluation in MLLM-based fashion systems.
翻译:时尚理解既需要视觉感知,也需要对风格、场合、搭配兼容性及着装逻辑的专家级推理。然而,现有时尚数据集仍存在碎片化与任务特定性问题,往往聚焦于单品属性、服装共现或弱文本监督,难以支持整体着装理解。本文提出FashionStylist——一个经专家标注的、面向整体与专家级时尚理解的基准数据集。通过专业时尚专家标注流水线构建,FashionStylist在单品与着装两个层面均提供了具有专业依据的标注。该数据集支持三项代表性任务:着装到单品定位、着装补全与着装评估。这些任务涵盖:从含层次搭配与配饰的复杂着装中实现单品还原,超越共现匹配的兼容性感知组合,以及针对风格、季节、场合与整体协调性的专家级评估。实验结果表明,FashionStylist不仅可作为多任务时尚理解的统一基准,还能有效提升基于多模态大语言模型的时尚系统在定位、补全及着装级语义评估方面的性能。