Video generation models have achieved remarkable progress, particularly excelling in realistic scenarios; however, their performance degrades notably in imaginative scenarios. These prompts often involve rarely co-occurring concepts with long-distance semantic relationships, falling outside training distributions. Existing methods typically apply test-time scaling for improving video quality, but their fixed search spaces and static reward designs limit adaptability to imaginative scenarios. To fill this gap, we propose ImagerySearch, a prompt-guided adaptive test-time search strategy that dynamically adjusts both the inference search space and reward function according to semantic relationships in the prompt. This enables more coherent and visually plausible videos in challenging imaginative settings. To evaluate progress in this direction, we introduce LDT-Bench, the first dedicated benchmark for long-distance semantic prompts, consisting of 2,839 diverse concept pairs and an automated protocol for assessing creative generation capabilities. Extensive experiments show that ImagerySearch consistently outperforms strong video generation baselines and existing test-time scaling approaches on LDT-Bench, and achieves competitive improvements on VBench, demonstrating its effectiveness across diverse prompt types. We will release LDT-Bench and code to facilitate future research on imaginative video generation.
翻译:视频生成模型已取得显著进展,尤其在现实场景中表现卓越;然而,在想象性场景中其性能明显下降。这类提示通常涉及罕见共现且具有长距离语义关系的概念,超出了训练分布范围。现有方法通常采用测试时缩放来提升视频质量,但其固定的搜索空间和静态奖励设计限制了其对想象性场景的适应性。为填补这一空白,我们提出了ImagerySearch,一种提示引导的自适应测试时搜索策略,能够根据提示中的语义关系动态调整推理搜索空间和奖励函数。这使得在具有挑战性的想象性场景中能够生成更连贯且视觉合理的视频。为评估该方向的进展,我们引入了LDT-Bench,首个专用于长距离语义提示的基准测试集,包含2,839个多样化概念对以及用于评估创造性生成能力的自动化协议。大量实验表明,ImagerySearch在LDT-Bench上持续优于强大的视频生成基线模型和现有测试时缩放方法,并在VBench上取得了具有竞争力的改进,证明了其在多样化提示类型上的有效性。我们将开源LDT-Bench和代码,以促进未来关于想象性视频生成的研究。