Text-driven image style transfer has seen remarkable progress with methods leveraging cross-modal embeddings for fast, high-quality stylization. However, most existing pipelines assume a \emph{single} textual style prompt, limiting the range of artistic control and expressiveness. In this paper, we propose a novel \emph{multi-prompt style interpolation} framework that extends the recently introduced \textbf{StyleMamba} approach. Our method supports blending or interpolating among multiple textual prompts (eg, ``cubism,'' ``impressionism,'' and ``cartoon''), allowing the creation of nuanced or hybrid artistic styles within a \emph{single} image. We introduce a \textit{Multi-Prompt Embedding Mixer} combined with \textit{Adaptive Blending Weights} to enable fine-grained control over the spatial and semantic influence of each style. Further, we propose a \emph{Hierarchical Masked Directional Loss} to refine region-specific style consistency. Experiments and user studies confirm our approach outperforms single-prompt baselines and naive linear combinations of styles, achieving superior style fidelity, text-image alignment, and artistic flexibility, all while maintaining the computational efficiency offered by the state-space formulation.
翻译:文本驱动的图像风格迁移方法通过利用跨模态嵌入技术,已实现了快速、高质量的图像风格化,取得了显著进展。然而,现有的大多数流程均假设使用\emph{单一}文本风格提示,这限制了艺术控制的范围和表现力。本文提出了一种新颖的\emph{多提示风格插值}框架,该框架扩展了近期提出的\textbf{StyleMamba}方法。我们的方法支持在多个文本提示(例如,“立体主义”、“印象派”和“卡通”)之间进行混合或插值,从而允许在\emph{单张}图像内创建细腻或混合的艺术风格。我们引入了\textit{多提示嵌入混合器}与\textit{自适应混合权重}相结合,以实现对每种风格在空间和语义上影响的细粒度控制。此外,我们提出了一种\emph{分层掩码方向性损失}来优化特定区域的风格一致性。实验和用户研究证实,我们的方法在保持状态空间公式所提供的计算效率的同时,超越了单提示基线方法和简单的风格线性组合,在风格保真度、文本-图像对齐度和艺术灵活性方面均表现优异。