Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization

Tsai-Shien Chen,Aliaksandr Siarohin,Gordon Guocheng Qian,Kuan-Chieh Jackson Wang,Egor Nemchinov,Moayed Haji-Ali,Riza Alp Guler,Willi Menapace,Ivan Skorokhodov,Anil Kag,Jun-Yan Zhu,Sergey Tulyakov

from arxiv, CVPR 2026. Project page: https://snap-research.github.io/omni-attribute

Visual concept personalization aims to transfer only specific image attributes, such as identity, expression, lighting, and style, into unseen contexts. However, existing methods rely on holistic embeddings from general-purpose image encoders, which entangle multiple visual factors and make it difficult to isolate a single attribute. This often leads to information leakage and incoherent synthesis. To address this limitation, we introduce Omni-Attribute, the first open-vocabulary image attribute encoder designed to learn high-fidelity, attribute-specific representations. Our approach jointly designs the data and model: (i) we curate semantically linked image pairs annotated with positive and negative attributes to explicitly teach the encoder what to preserve or suppress; and (ii) we adopt a dual-objective training paradigm that balances generative fidelity with contrastive disentanglement. The resulting embeddings prove effective for open-vocabulary attribute retrieval, personalization, and compositional generation, achieving state-of-the-art performance across multiple benchmarks.

翻译：视觉概念个性化旨在将特定图像属性（如身份、表情、光照和风格）迁移至未见上下文中。然而，现有方法依赖通用图像编码器的整体嵌入表示，这些表示纠缠了多种视觉因素，难以隔离单一属性，常导致信息泄露和合成不一致。为应对这一局限，我们提出全属性——首个专门学习高保真、属性特定表征的开放词汇图像属性编码器。本方法从数据与模型双维度协同设计：(i) 构建语义关联的图像对，并用正负属性标注，显式教导编码器应保留或抑制的内容；(ii) 采用平衡生成保真度与对比解耦的双目标训练范式。所得嵌入表示在开放词汇属性检索、个性化合成及组合式生成中均展示出有效性，在多个基准测试中达到最先进性能。

相关内容

属性

关注 2

一个具体事物，总是有许许多多的性质与关系，我们把一个事物的性质与关系，都叫作事物的属性。事物与属性是不可分的，事物都是有属性的事物，属性也都是事物的属性。一个事物与另一个事物的相同或相异，也就是一个事物的属性与另一事物的属性的相同或相异。由于事物属性的相同或相异，客观世界中就形成了许多不同的事物类。具有相同属性的事物就形成一类，具有不同属性的事物就分别地形成不同的类。

【NeurIPS2025】Seg4Diff：揭示文本到图像扩散 Transformer 中的开放词汇分割

专知会员服务

10+阅读 · 2025年9月23日

【CVPR2025】个性化视觉与语言生成

专知会员服务

9+阅读 · 2025年5月1日

【ETHZ博士论文】面向开放集计算机视觉的语言引导，157页pdf

专知会员服务

27+阅读 · 2025年1月1日

【CVPR2024】OmniViD: 一个用于通用视频理解的生成框架

专知会员服务

25+阅读 · 2024年3月27日