We introduce text2fabric, a novel dataset that links free-text descriptions to various fabric materials. The dataset comprises 15,000 natural language descriptions associated to 3,000 corresponding images of fabric materials. Traditionally, material descriptions come in the form of tags/keywords, which limits their expressivity, induces pre-existing knowledge of the appropriate vocabulary, and ultimately leads to a chopped description system. Therefore, we study the use of free-text as a more appropriate way to describe material appearance, taking the use case of fabrics as a common item that non-experts may often deal with. Based on the analysis of the dataset, we identify a compact lexicon, set of attributes and key structure that emerge from the descriptions. This allows us to accurately understand how people describe fabrics and draw directions for generalization to other types of materials. We also show that our dataset enables specializing large vision-language models such as CLIP, creating a meaningful latent space for fabric appearance, and significantly improving applications such as fine-grained material retrieval and automatic captioning.
翻译:本文提出text2fabric——一个将自由文本描述与多种织物材料关联起来的新型数据集。该数据集包含15,000条自然语言描述,对应3,000张织物材料图像。传统材料描述通常采用标签/关键词形式,这不仅限制了表达力,还要求使用者具备相关词汇的先验知识,最终导致描述体系支离破碎。为此,我们探索以自由文本作为更合适的材料外观描述方式,并以非专业人士常接触的常见物品——织物为例展开研究。基于数据集分析,我们识别出描述中涌现的紧凑词表、属性集合及关键结构,从而精确理解人类描述织物的方式,并为推广至其他材料类型指明方向。实验表明,该数据集可有效增强CLIP等大型视觉语言模型的特化能力,构建具有语义意义的织物外观潜在空间,并显著提升细粒度材料检索与自动描述生成等应用性能。