We propose MLLM-VADStory, a novel domain knowledge-guided multimodal large language models (MLLM) framework to systematically quantify and generate insights for video ad storyline understanding at scale. The framework is centered on the core idea that ad narratives are structured by functional intent, with each scene unit performing a distinct communicative function, delivering product and brand-oriented information within seconds. MLLM-VADStory segments ads into functional units, classifies each unit's functionality using a novel advertising-specific functional role taxonomy, and then aggregates functional sequences across ads to recover data-driven storyline structures. Applying the framework to 50k social media video ads across four industry subverticals, we find that story-based creatives improve video retention, and we recommend top-performing story arcs to guide advertisers in creative design. Our framework demonstrates the value of using domain knowledge to guide MLLMs in generating scalable insights for video ad storylines, making it a versatile tool for understanding video creatives in general.
翻译:我们提出了MLLM-VADStory,一种新颖的领域知识引导的多模态大语言模型框架,用于系统性地量化并大规模生成视频广告叙事理解的洞察。该框架的核心思想是:广告叙事由功能意图构建,每个场景单元执行特定的传播功能,在数秒内传递以产品和品牌为导向的信息。MLLM-VADStory将广告分割为功能单元,使用一种新颖的广告专用功能角色分类法对每个单元的功能进行分类,然后跨广告聚合功能序列,以恢复数据驱动的叙事结构。将该框架应用于四个行业细分领域的五万个社交媒体视频广告后,我们发现基于故事的创意能提高视频留存率,并推荐表现最优的叙事弧以指导广告主进行创意设计。我们的框架证明了利用领域知识引导多模态大语言模型生成可扩展的视频广告叙事洞察的价值,使其成为理解视频创意的一个通用工具。