Fashion understanding requires both visual perception and expert-level reasoning about style, occasion, compatibility, and outfit rationale. However, existing fashion datasets remain fragmented and task-specific, often focusing on item attributes, outfit co-occurrence, or weak textual supervision, and thus provide limited support for holistic outfit understanding. In this paper, we introduce FashionStylist, an expert-annotated benchmark for holistic and expert-level fashion understanding. Constructed through a dedicated fashion-expert annotation pipeline, FashionStylist provides professionally grounded annotations at both the item and outfit levels. It supports three representative tasks: outfit-to-item grounding, outfit completion, and outfit evaluation. These tasks cover realistic item recovery from complex outfits with layering and accessories, compatibility-aware composition beyond co-occurrence matching, and expert-level assessment of style, season, occasion, and overall coherence. Experimental results show that FashionStylist serves not only as a unified benchmark for multiple fashion tasks, but also as an effective training resource for improving grounding, completion, and outfit-level semantic evaluation in MLLM-based fashion systems.
翻译:时尚理解不仅需要视觉感知能力,还需要关于风格、场合、搭配合理性及着装逻辑方面的专家级推理。然而,现有的时尚数据集仍然零散且特定于任务,通常侧重于单品属性、服装搭配共现或弱文本监督,因此对整体着装理解的支持有限。在本文中,我们介绍了FashionStylist,一个用于整体和专家级时尚理解的专家标注基准。通过专门的时尚专家标注流程构建,FashionStylist在单品和整体着装两个层面提供了专业级别的标注。它支持三个代表性任务:着装到单品定位、着装补全和着装评估。这些任务涵盖了从包含层次叠穿和配饰的复杂着装中恢复真实单品、超越共现匹配的兼容性感知组合,以及对风格、季节、场合和整体一致性的专家级评估。实验结果表明,FashionStylist不仅可作为多个时尚任务的统一基准,还能作为有效的训练资源,用于改进基于多模态大语言模型(MLLM)时尚系统中的单品定位、补全以及着装层面语义评估能力。