Emotion is important for creating compelling virtual reality (VR) content. Although some generative methods have been applied to lower the barrier to creating emotionally rich content, they fail to capture the nuanced emotional semantics and the fine-grained control essential for immersive experiences. To address these limitations, we introduce EmoSpace, a novel framework for emotion-aware content generation that learns dynamic, interpretable emotion prototypes through vision-language alignment. We employ a hierarchical emotion representation with rich learnable prototypes that evolve during training, enabling fine-grained emotional control without requiring explicit emotion labels. We develop a controllable generation pipeline featuring multi-prototype guidance, temporal blending, and attention reweighting that supports diverse applications, including emotional image outpainting, stylized generation, and emotional panorama generation for VR environments. Our experiments demonstrate the superior performance of EmoSpace over existing methods in both qualitative and quantitative evaluations. Additionally, we present a comprehensive user study investigating how VR environments affect emotional perception compared to desktop settings. Our work facilitates immersive visual content generation with fine-grained emotion control and supports applications like therapy, education, storytelling, artistic creation, and cultural preservation. Code and models will be made publicly available.
翻译:情感对于创造引人入胜的虚拟现实(VR)内容至关重要。尽管已有一些生成方法被应用于降低创建情感丰富内容的门槛,但它们未能捕捉到细腻的情感语义以及沉浸式体验所必需的细粒度控制。为应对这些局限,我们提出了EmoSpace,这是一个用于情感感知内容生成的新型框架,它通过视觉-语言对齐学习动态、可解释的情感原型。我们采用了一种具有丰富可学习原型的层次化情感表示,这些原型在训练过程中动态演化,从而能够在无需显式情感标签的情况下实现细粒度的情感控制。我们开发了一个可控生成流程,其特点是包含多原型引导、时序融合和注意力重加权,支持多样化的应用,包括情感图像外绘、风格化生成以及面向VR环境的情感全景图生成。我们的实验表明,EmoSpace在定性和定量评估中均优于现有方法。此外,我们进行了一项全面的用户研究,探讨了VR环境相较于桌面设置如何影响情感感知。我们的工作促进了具有细粒度情感控制的沉浸式视觉内容生成,并支持诸如治疗、教育、叙事、艺术创作和文化保护等应用。代码和模型将公开提供。