SEAL: Semantic-aware Single-image Sticker Personalization with a Large-scale Sticker-tag Dataset

Synthesizing a target concept from a single reference image is challenging in diffusion-based personalized text-to-image generation, particularly for sticker personalization where prompts often require explicit attribute edits. With only one reference, test-time fine-tuning (TTF) methods tend to overfit, producing \textit{visual entanglement}, where background artifacts are absorbed into the learned concept, and \textit{structural rigidity}, where the model memorizes reference-specific spatial configurations and loses contextual controllability. To address these issues, we introduce \textbf{SE}mantic-aware single-image sticker person\textbf{AL}ization (\textbf{SEAL}), a plug-and-play, architecture-agnostic adaptation module that integrates into existing personalization pipelines without modifying their U-Net-based diffusion backbones. SEAL applies three components during embedding adaptation: (1) a Semantic-guided Spatial Attention Loss, (2) a Split-merge Token Strategy, and (3) Structure-aware Layer Restriction. To support sticker-domain personalization with attribute-level control, we present StickerBench, a large-scale sticker image dataset with structured tags under a six-attribute schema (Appearance, Emotion, Action, Camera Composition, Style, Background). These annotations provide a consistent interface for varying context while keeping target identity fixed, enabling systematic evaluation of identity disentanglement and contextual controllability. Experiments show that SEAL consistently improves identity preservation while maintaining contextual controllability, highlighting the importance of explicit spatial and structural constraints during test-time adaptation. The code, StickerBench, and project page will be publicly released.

翻译：从单张参考图像合成目标概念在基于扩散的个性化文本到图像生成中具有挑战性，尤其对于贴纸个性化任务——其提示词常需显式属性编辑。仅依赖单张参考时，测试时微调方法易过拟合，产生两种问题：一是视觉纠缠（背景伪影被吸收至习得概念中），二是结构僵化（模型记忆参考特有的空间配置而失去上下文可控性）。为解决这些问题，我们提出面向语义感知的单图像贴纸个性化（SEAL）方法——一种即插即用、架构无关的适配模块，可无缝集成至现有个性化流程中，无需修改其基于U-Net的扩散主干网络。SEAL在嵌入式适配过程中应用了三个组件：(1) 语义引导的空间注意力损失，(2) 拆分合并令牌策略，(3) 结构感知层约束。为支撑含属性级控制的贴纸域个性化，我们构建了StickerBench——一个大规模贴纸图像数据集，其结构化标签遵循六属性模式（外观、表情、动作、相机构图、风格、背景）。这些标注在保持目标身份不变的同时提供了变化的上下文接口，从而实现对身份解耦与上下文可控性的系统评估。实验表明，SEAL在维持上下文可控性的前提下显著提升了身份保持能力，凸显了测试时适配中显式空间与结构约束的重要性。代码、StickerBench数据集及相关项目页面将公开发布。