Visuals can enhance our experience of music, owing to the way they can amplify the emotions and messages conveyed within it. However, creating music visualization is a complex, time-consuming, and resource-intensive process. We introduce Generative Disco, a generative AI system that helps generate music visualizations with large language models and text-to-video generation. The system helps users visualize music in intervals by finding prompts to describe the images that intervals start and end on and interpolating between them to the beat of the music. We introduce design patterns for improving these generated videos: transitions, which express shifts in color, time, subject, or style, and holds, which help focus the video on subjects. A study with professionals showed that transitions and holds were a highly expressive framework that enabled them to build coherent visual narratives. We conclude on the generalizability of these patterns and the potential of generated video for creative professionals.
翻译:视觉能够增强我们对音乐的体验,因为它们可以放大音乐所传达的情感和信息。然而,创建音乐可视化是一个复杂、耗时且资源密集的过程。我们提出了“生成式迪斯科”(Generative Disco),这是一个生成式人工智能系统,利用大语言模型和文本到视频生成技术来辅助生成音乐可视化。该系统通过查找描述音乐片段起始和结束图像的提示,并随音乐节拍在两者之间进行插值,帮助用户以分段形式可视化音乐。我们引入了用于改进这些生成视频的设计模式:过渡(transitions),用于表达色彩、时间、主体或风格的变换;以及保持(holds),用于使视频聚焦于主体。一项面向专业人士的研究表明,过渡和保持构成了一个高度表达性的框架,使他们能够构建连贯的视觉叙事。我们总结了这些模式的普适性,并探讨了生成视频对创意专业人士的潜力。