Visual concept personalization aims to transfer only specific image attributes, such as identity, expression, lighting, and style, into unseen contexts. However, existing methods rely on holistic embeddings from general-purpose image encoders, which entangle multiple visual factors and make it difficult to isolate a single attribute. This often leads to information leakage and incoherent synthesis. To address this limitation, we introduce Omni-Attribute, the first open-vocabulary image attribute encoder designed to learn high-fidelity, attribute-specific representations. Our approach jointly designs the data and model: (i) we curate semantically linked image pairs annotated with positive and negative attributes to explicitly teach the encoder what to preserve or suppress; and (ii) we adopt a dual-objective training paradigm that balances generative fidelity with contrastive disentanglement. The resulting embeddings prove effective for open-vocabulary attribute retrieval, personalization, and compositional generation, achieving state-of-the-art performance across multiple benchmarks.
翻译:视觉概念个性化旨在将特定图像属性(如身份、表情、光照和风格)迁移至未见上下文中。然而,现有方法依赖通用图像编码器的整体嵌入表示,这些表示纠缠了多种视觉因素,难以隔离单一属性,常导致信息泄露和合成不一致。为应对这一局限,我们提出全属性——首个专门学习高保真、属性特定表征的开放词汇图像属性编码器。本方法从数据与模型双维度协同设计:(i) 构建语义关联的图像对,并用正负属性标注,显式教导编码器应保留或抑制的内容;(ii) 采用平衡生成保真度与对比解耦的双目标训练范式。所得嵌入表示在开放词汇属性检索、个性化合成及组合式生成中均展示出有效性,在多个基准测试中达到最先进性能。