Clothing segmentation and fine-grained attribute recognition are challenging tasks at the crossing of computer vision and fashion, which segment the entire ensemble clothing instances as well as recognize detailed attributes of the clothing products from any input human images. Many new models have been developed for the tasks in recent years, nevertheless the segmentation accuracy is less than satisfactory in case of layered clothing or fashion products in different scales. In this paper, a new DEtection TRansformer (DETR) based method is proposed to segment and recognize fine-grained attributes of ensemble clothing instances with high accuracy. In this model, we propose a \textbf{multi-layered attention module} by aggregating features of different scales, determining the various scale components of a single instance, and merging them together. We train our model on the Fashionpedia dataset and demonstrate our method surpasses SOTA models in tasks of layered clothing segmentation and fine-grained attribute recognition.
翻译:服装分割与细粒度属性识别是计算机视觉与时尚交叉领域中的挑战性任务,旨在从任意输入人体图像中分割整体服装实例,并识别服装产品的详细属性。近年来,针对这些任务已开发出许多新模型,然而在分层服装或不同尺度的时尚产品场景中,分割精度仍不尽如人意。本文提出一种基于DEtection TRansformer(DETR)的新方法,用于高精度分割并识别整体服装实例的细粒度属性。在该模型中,我们通过聚合不同尺度的特征,确定单个实例的各类尺度分量,并将其融合,从而提出一个\textbf{多层注意力模块}。我们在Fashionpedia数据集上训练模型,实验结果表明,该方法在分层服装分割和细粒度属性识别任务中优于现有最优模型。