Olfactory cues can enhance immersion in interactive media, yet smell remains rare because it is difficult to author and synchronize with dynamic video. Prior olfactory interfaces rely on designer triggers and fixed event-to-odor mappings that do not scale to unconstrained content. This work examines whether semantic planning for smell is intelligible to people before physical scent delivery. We present a video-to-scent planning pipeline that separates visual semantic extraction using a vision-language model from semantic-to-olfactory inference using a large language model. Two survey studies compare system-generated scent plans with over-inclusive and naive baselines. Results show consistent preference for plans that prioritize perceptually salient cues and align scent changes with visible actions, supporting semantic planning as a foundation for future olfactory media systems.
翻译:嗅觉线索能够增强交互式媒体的沉浸感,但气味应用仍然罕见,这主要是因为难以创作并与动态视频同步。现有的嗅觉界面依赖于设计者预设的触发器和固定的事件-气味映射,难以适应无约束的内容。本研究探讨了在物理气味释放之前,气味语义规划对人类是否具有可理解性。我们提出了一种视频到气味的规划流程,该流程将使用视觉-语言模型进行的视觉语义提取与使用大语言模型进行的语义到嗅觉推理分离开来。两项调查研究将系统生成的气味规划方案与过度包容和朴素的基线方案进行了比较。结果表明,人们一致倾向于优先考虑感知显著线索、并使气味变化与可见动作保持一致的规划方案,这支持了将语义规划作为未来嗅觉媒体系统的基础。